Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Dec;58 Suppl(Suppl):S6-S10.
doi: 10.1016/j.jbi.2015.09.018. Epub 2015 Oct 1.

Creation of a new longitudinal corpus of clinical narratives

Affiliations

Creation of a new longitudinal corpus of clinical narratives

Vishesh Kumar et al. J Biomed Inform. 2015 Dec.

Abstract

The 2014 i2b2/UTHealth Natural Language Processing (NLP) shared task featured a new longitudinal corpus of 1304 records representing 296 diabetic patients. The corpus contains three cohorts: patients who have a diagnosis of coronary artery disease (CAD) in their first record, and continue to have it in subsequent records; patients who do not have a diagnosis of CAD in the first record, but develop it by the last record; patients who do not have a diagnosis of CAD in any record. This paper details the process used to select records for this corpus and provides an overview of novel research uses for this corpus. This corpus is the only annotated corpus of longitudinal clinical narratives currently available for research to the general research community.

Keywords: Corpus; Machine learning; Medical records; NLP.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Average ages of patients at visit times by cohort
Figure 2
Figure 2
Population by group and gender

References

    1. Hersh William, Buckley Chris, Leone TJ, Hickam David. OHSUMED: an interactive retrieval evaluation and new large test collection for research. In: Bruce Croft W, van Rijsbergen CJ, editors. Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR '94) Springer-Verlag New York, Inc.; New York, NY, USA: 1994. pp. 192–201.
    1. Yeh Alexander, Hirschman Lynette, Morgan Alexander. Background and overview for KDD Cup 2002 task 1: information extraction from biomedical articles. SIGKDD Explor. Newsl. 2002 2002 Dec;4(2):87–89. DOI=10.1145/772862.772873 http://doi.acm.org/10.1145/772862.772873. - DOI
    1. Hersh William, Voorhees Ellen. TREC genomics special issue overview. Information Retrieval. 2008;12:1–15.
    1. Chapman WW, Nadkarni PM, Hirschman L, D'Avolio LW, Savova GK, Uzuner O. Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions. Journal of the American Medical Informatics Association. 2011;18(5):540–543. - PMC - PubMed
    1. Clifford GD, Scott DJ, Villarroel M. User Guide and Documentation for the MIMIC II Database 2012, database version 2.6. available online: https://mimic.physionet.org/UserGuide/UserGuide.html.

MeSH terms