Dense Annotation of Free-Text Critical Care Discharge Summaries from an Indian Hospital and Associated Performance of a Clinical NLP Annotator
- PMID: 27342107
- DOI: 10.1007/s10916-016-0541-2
Dense Annotation of Free-Text Critical Care Discharge Summaries from an Indian Hospital and Associated Performance of a Clinical NLP Annotator
Abstract
Electronic Health Record (EHR) use in India is generally poor, and structured clinical information is mostly lacking. This work is the first attempt aimed at evaluating unstructured text mining for extracting relevant clinical information from Indian clinical records. We annotated a corpus of 250 discharge summaries from an Intensive Care Unit (ICU) in India, with markups for diseases, procedures, and lab parameters, their attributes, as well as key demographic information and administrative variables such as patient outcomes. In this process, we have constructed guidelines for an annotation scheme useful to clinicians in the Indian context. We evaluated the performance of an NLP engine, Cocoa, on a cohort of these Indian clinical records. We have produced an annotated corpus of roughly 90 thousand words, which to our knowledge is the first tagged clinical corpus from India. Cocoa was evaluated on a test corpus of 50 documents. The overlap F-scores across the major categories, namely disease/symptoms, procedures, laboratory parameters and outcomes, are 0.856, 0.834, 0.961 and 0.872 respectively. These results are competitive with results from recent shared tasks based on US records. The annotated corpus and associated results from the Cocoa engine indicate that unstructured text mining is a viable method for cohort analysis in the Indian clinical context, where structured EHR records are largely absent.
Keywords: Biomedical text extraction; Data mining; Discharge summary; Natural language processing; Text annotation.
Similar articles
-
PhenoDEF: a corpus for annotating sentences with information of phenotype definitions in biomedical literature.J Biomed Semantics. 2022 Jun 11;13(1):17. doi: 10.1186/s13326-022-00272-6. J Biomed Semantics. 2022. PMID: 35690873 Free PMC article.
-
Detecting adverse drug reactions in discharge summaries of electronic medical records using Readpeer.Int J Med Inform. 2019 Aug;128:62-70. doi: 10.1016/j.ijmedinf.2019.04.017. Epub 2019 May 25. Int J Med Inform. 2019. PMID: 31160013
-
Using text mining techniques to extract phenotypic information from the PhenoCHF corpus.BMC Med Inform Decis Mak. 2015;15 Suppl 2(Suppl 2):S3. doi: 10.1186/1472-6947-15-S2-S3. Epub 2015 Jun 15. BMC Med Inform Decis Mak. 2015. PMID: 26099853 Free PMC article.
-
Natural language processing of symptoms documented in free-text narratives of electronic health records: a systematic review.J Am Med Inform Assoc. 2019 Apr 1;26(4):364-379. doi: 10.1093/jamia/ocy173. J Am Med Inform Assoc. 2019. PMID: 30726935 Free PMC article.
-
Natural Language Processing Technologies in Radiology Research and Clinical Applications.Radiographics. 2016 Jan-Feb;36(1):176-91. doi: 10.1148/rg.2016150080. Radiographics. 2016. PMID: 26761536 Free PMC article. Review.
Cited by
-
Identification of Gender Differences in Acute Myocardial Infarction Presentation and Management at Aga Khan University Hospital-Pakistan: Natural Language Processing Application in a Dataset of Patients With Cardiovascular Disease.JMIR Form Res. 2024 Dec 20;8:e42774. doi: 10.2196/42774. JMIR Form Res. 2024. PMID: 39705071 Free PMC article.
-
Can antiepileptic efficacy and epilepsy variables be studied from electronic health records? A review of current approaches.Seizure. 2021 Feb;85:138-144. doi: 10.1016/j.seizure.2020.11.011. Epub 2021 Jan 13. Seizure. 2021. PMID: 33461032 Free PMC article.
References
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical
Miscellaneous