Annotating risk factors for heart disease in clinical narratives for diabetic patients
- PMID: 26004790
- PMCID: PMC4978180
- DOI: 10.1016/j.jbi.2015.05.009
Annotating risk factors for heart disease in clinical narratives for diabetic patients
Abstract
The 2014 i2b2/UTHealth natural language processing shared task featured a track focused on identifying risk factors for heart disease (specifically, Cardiac Artery Disease) in clinical narratives. For this track, we used a "light" annotation paradigm to annotate a set of 1304 longitudinal medical records describing 296 patients for risk factors and the times they were present. We designed the annotation task for this track with the goal of balancing annotation load and time with quality, so as to generate a gold standard corpus that can benefit a clinically-relevant task. We applied light annotation procedures and determined the gold standard using majority voting. On average, the agreement of annotators with the gold standard was above 0.95, indicating high reliability. The resulting document-level annotations generated for each record in each longitudinal EMR in this corpus provide information that can support studies of progression of heart disease risk factors in the included patients over time. These annotations were used in the Risk Factor track of the 2014 i2b2/UTHealth shared task. Participating systems achieved a mean micro-averaged F1 measure of 0.815 and a maximum F1 measure of 0.928 for identifying these risk factors in patient records.
Keywords: Annotation; Medical records; Natural language processing.
Copyright © 2015 Elsevier Inc. All rights reserved.
Figures
Similar articles
-
The role of fine-grained annotations in supervised recognition of risk factors for heart disease from EHRs.J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S111-S119. doi: 10.1016/j.jbi.2015.06.010. Epub 2015 Jun 26. J Biomed Inform. 2015. PMID: 26122527 Free PMC article.
-
Identifying risk factors for heart disease over time: Overview of 2014 i2b2/UTHealth shared task Track 2.J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S67-S77. doi: 10.1016/j.jbi.2015.07.001. Epub 2015 Jul 22. J Biomed Inform. 2015. PMID: 26210362 Free PMC article. Review.
-
Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2/UTHealth corpus.J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S20-S29. doi: 10.1016/j.jbi.2015.07.020. Epub 2015 Aug 28. J Biomed Inform. 2015. PMID: 26319540 Free PMC article.
-
Creation of a new longitudinal corpus of clinical narratives.J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S6-S10. doi: 10.1016/j.jbi.2015.09.018. Epub 2015 Oct 1. J Biomed Inform. 2015. PMID: 26433122 Free PMC article.
-
Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1.J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S11-S19. doi: 10.1016/j.jbi.2015.06.007. Epub 2015 Jul 28. J Biomed Inform. 2015. PMID: 26225918 Free PMC article. Review.
Cited by
-
A hybrid model for automatic identification of risk factors for heart disease.J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S171-S182. doi: 10.1016/j.jbi.2015.09.006. Epub 2015 Sep 12. J Biomed Inform. 2015. PMID: 26375492 Free PMC article.
-
Agile text mining for the 2014 i2b2/UTHealth Cardiac risk factors challenge.J Biomed Inform. 2015 Dec;58 Suppl(0):S120-S127. doi: 10.1016/j.jbi.2015.06.030. Epub 2015 Jul 22. J Biomed Inform. 2015. PMID: 26209007 Free PMC article.
-
Automatic prediction of coronary artery disease from clinical narratives.J Biomed Inform. 2017 Aug;72:23-32. doi: 10.1016/j.jbi.2017.06.019. Epub 2017 Jun 27. J Biomed Inform. 2017. PMID: 28663072 Free PMC article.
-
Privacy-Preserving Predictive Modeling: Harmonization of Contextual Embeddings From Different Sources.JMIR Med Inform. 2018 May 16;6(2):e33. doi: 10.2196/medinform.9455. JMIR Med Inform. 2018. PMID: 29769172 Free PMC article.
-
The role of fine-grained annotations in supervised recognition of risk factors for heart disease from EHRs.J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S111-S119. doi: 10.1016/j.jbi.2015.06.010. Epub 2015 Jun 26. J Biomed Inform. 2015. PMID: 26122527 Free PMC article.
References
-
- Miller Timothy, Bethard Steven, Dligach Dmitriy, Pradhan Sameer, Lin Chen, Savova Guergana. Proceedings of the 2013 Workshop on Biomedical Natural Language Processing. Association for Computational Linguistics; Sofia, Bulgaria: 2013. Discovering Temporal Narrative Containers in Clinical Text. pp. 18–26.
-
- NDIC (National Diabetes Information Clearinghouse) [February 19, 2014];Diabetes, Heart Disease, and Stroke. http://diabetes.niddk.nih.gov/dm/pubs/stroke/index.aspx.
-
- Pestian John P., Brew Christopher, Matykiewicz Paweł, Hovermale DJ, Johnson Neil, Bretonnel Cohen K, Duch Włodzisław. Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing (BioNLP '07) Association for Computational Linguistics; Stroudsburg, PA, USA: 2007. A shared task involving multi-label classification of clinical free text. pp. 97–104.
-
- Pustejovsky James, Stubbs Amber. 2011 Proceedings of the Linguistic Annotation Workshop V, Association of Computational Linguistics. Portland, Oregon: Jul 23-24, 2011. Increasing Informativeness in Temporal Annotation.
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical