Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Aug;40(8):187.
doi: 10.1007/s10916-016-0541-2. Epub 2016 Jun 24.

Dense Annotation of Free-Text Critical Care Discharge Summaries from an Indian Hospital and Associated Performance of a Clinical NLP Annotator

Affiliations

Dense Annotation of Free-Text Critical Care Discharge Summaries from an Indian Hospital and Associated Performance of a Clinical NLP Annotator

S V Ramanan et al. J Med Syst. 2016 Aug.

Abstract

Electronic Health Record (EHR) use in India is generally poor, and structured clinical information is mostly lacking. This work is the first attempt aimed at evaluating unstructured text mining for extracting relevant clinical information from Indian clinical records. We annotated a corpus of 250 discharge summaries from an Intensive Care Unit (ICU) in India, with markups for diseases, procedures, and lab parameters, their attributes, as well as key demographic information and administrative variables such as patient outcomes. In this process, we have constructed guidelines for an annotation scheme useful to clinicians in the Indian context. We evaluated the performance of an NLP engine, Cocoa, on a cohort of these Indian clinical records. We have produced an annotated corpus of roughly 90 thousand words, which to our knowledge is the first tagged clinical corpus from India. Cocoa was evaluated on a test corpus of 50 documents. The overlap F-scores across the major categories, namely disease/symptoms, procedures, laboratory parameters and outcomes, are 0.856, 0.834, 0.961 and 0.872 respectively. These results are competitive with results from recent shared tasks based on US records. The annotated corpus and associated results from the Cocoa engine indicate that unstructured text mining is a viable method for cohort analysis in the Indian clinical context, where structured EHR records are largely absent.

Keywords: Biomedical text extraction; Data mining; Discharge summary; Natural language processing; Text annotation.

PubMed Disclaimer

Similar articles

Cited by

References

    1. Stud Health Technol Inform. 1998;52 Pt 2:874-8 - PubMed
    1. J Biomed Inform. 2014 Apr;48:54-65 - PubMed
    1. J Am Med Inform Assoc. 2013 Sep-Oct;20(5):806-13 - PubMed
    1. J Am Med Inform Assoc. 2010 Sep-Oct;17(5):519-23 - PubMed
    1. Gastrointest Endosc. 2012 Jun;75(6):1233-9.e14 - PubMed

MeSH terms

LinkOut - more resources