Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Mar 17;12(3):e0173410.
doi: 10.1371/journal.pone.0173410. eCollection 2017.

Automatic ICD-10 coding algorithm using an improved longest common subsequence based on semantic similarity

Affiliations

Automatic ICD-10 coding algorithm using an improved longest common subsequence based on semantic similarity

YunZhi Chen et al. PLoS One. .

Abstract

ICD-10(International Classification of Diseases 10th revision) is a classification of a disease, symptom, procedure, or injury. Diseases are often described in patients' medical records with free texts, such as terms, phrases and paraphrases, which differ significantly from those used in ICD-10 classification. This paper presents an improved approach based on the Longest Common Subsequence (LCS) and semantic similarity for automatic Chinese diagnoses, mapping from the disease names given by clinician to the disease names in ICD-10. LCS refers to the longest string that is a subsequence of every member of a given set of strings. The proposed method of improved LCS in this paper can increase the accuracy of processing in Chinese disease mapping.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Flowchart of LCS algorithm.
Fig 2
Fig 2. Corpus of Chinese word segmentation of 181 kinds of hepatitis.
Fig 3
Fig 3. Similarity line chart when L(A) = 1.
Fig 4
Fig 4. Similarity line chart when L(A) = 2.
Fig 5
Fig 5. Similarity line chart when L(A) = 3.
Fig 6
Fig 6. Similarity line chart when L(A) = 4.
Fig 7
Fig 7. Similarity line chart when L(A) = 5.
Fig 8
Fig 8. Similarity line chart when L(A) = 6.
Fig 9
Fig 9. Accuracy analysis chart under similarity threshold (n = 1000).
Fig 10
Fig 10. Given threshold of coding accuracy and percentage.

Similar articles

Cited by

References

    1. O’Malley KJ, Cook KF, Price MD, Wildes KR, Hurdle JF, and Ashton CM. Measuring diagnoses:ICD code accuracy.Health Services Research. 2005;40:1620–1639 10.1111/j.1475-6773.2005.00444.x - DOI - PMC - PubMed
    1. Arifo˘glu D, Deniz O, Aleçakır K and Yöndem M. CodeMagic: Semi-Automatic Assignment of ICD-10-AM Codes to Patient Records. Information Sciences and Systems. 2014:259–268
    1. Boytcheva S. Automatic Matching of ICD-10 codes to Diagnoses in Discharge Letters. Proceedings of the Workshop on Biomedical Natural Language Processing. 2011;9:11–18
    1. Patrick J, Zhang Y, Wang Y. Developing feature types for classifying clinical notes. Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing; 2007.pp.191–192
    1. Zweigenbaum P, Lavergne T, Hybrid methods for ICD-10 coding of death certificates, Proceedings of the Seventh International Workshop on Health Text Mining and Information Analysis (LOUHI); 2016.pp.96–105

LinkOut - more resources