Using n-gram method in the decomposition of compound medical diagnoses
- PMID: 12909174
- DOI: 10.1016/s1386-5056(03)00049-2
Using n-gram method in the decomposition of compound medical diagnoses
Abstract
Objective: Our goal in this study was to find an easy to implement method to detect compound medical diagnosis in Hungarian medical language and decompose them into expressions referring to a single disease.
Methods: A corpus of clinical diagnoses extracted form discharge reports (3,079 expressions, each of them referring to only one disease) was represented in an n-gram tree (a series of n consecutive word). A matching algorithm was implemented in a software, which is able to identify sensible n-grams existing both in test expressions and in the n-gram tree. A test sample of another 92 diagnoses was decomposed by two independent humans and by the software. The decompositions were compared with measure the recall and the precision of the method.
Results: There was not full agreement between the decompositions of the humans, (which underlines the relevance of the problem). A consensus was arrived in all disagreed point by a third opinion and open discussion. The resulting decomposition was used as a gold standard and compared with the decomposition produced by the computer. The recall was 82.6% the precision 37.2%. After correction of spelling errors in the test sample the recall increased to 88.6% while the precision slightly decreased to 36.7%.
Conclusion: The proposed method seems to be useful in decomposition of compound diagnostic expressions and can improve quality of diagnostic coding of clinical cases. Other statistical methods (like vector space methods or neural networks) usually offer a ranked list of candidate codes either for single or compound expressions, and do not warn the user how many codes should be chosen. We propose our method especially in a situation where formal NLP techniques are not available, as it is the case with scarcely spoken languages like Hungarian.
Similar articles
-
Using n-gram method in the decomposition of compound medical diagnoses.Stud Health Technol Inform. 2002;90:455-9. Stud Health Technol Inform. 2002. PMID: 15460736
-
About the language of Hungarian discharge reports.Stud Health Technol Inform. 2003;95:869-73. Stud Health Technol Inform. 2003. PMID: 14664098
-
Indexing of medical diagnoses by word affinity method.Stud Health Technol Inform. 2001;84(Pt 1):276-9. Stud Health Technol Inform. 2001. PMID: 11604748
-
Evaluation of Chatbot Prototypes for Taking the Virtual Patient's History.Stud Health Technol Inform. 2019;260:73-80. Stud Health Technol Inform. 2019. PMID: 31118321 Review.
-
Review and evaluation of electronic health records-driven phenotype algorithm authoring tools for clinical and translational research.J Am Med Inform Assoc. 2015 Nov;22(6):1251-60. doi: 10.1093/jamia/ocv070. Epub 2015 Jul 29. J Am Med Inform Assoc. 2015. PMID: 26224336 Free PMC article. Review.
Cited by
-
Doublet method for very fast autocoding.BMC Med Inform Decis Mak. 2004 Sep 15;4:16. doi: 10.1186/1472-6947-4-16. BMC Med Inform Decis Mak. 2004. PMID: 15369595 Free PMC article.
-
Automatic ICD-10 coding algorithm using an improved longest common subsequence based on semantic similarity.PLoS One. 2017 Mar 17;12(3):e0173410. doi: 10.1371/journal.pone.0173410. eCollection 2017. PLoS One. 2017. PMID: 28306739 Free PMC article.
MeSH terms
LinkOut - more resources
Full Text Sources