Agile text mining for the 2014 i2b2/UTHealth Cardiac risk factors challenge
- PMID: 26209007
- PMCID: PMC4737484
- DOI: 10.1016/j.jbi.2015.06.030
Agile text mining for the 2014 i2b2/UTHealth Cardiac risk factors challenge
Abstract
This paper describes the use of an agile text mining platform (Linguamatics' Interactive Information Extraction Platform, I2E) to extract document-level cardiac risk factors in patient records as defined in the i2b2/UTHealth 2014 challenge. The approach uses a data-driven rule-based methodology with the addition of a simple supervised classifier. We demonstrate that agile text mining allows for rapid optimization of extraction strategies, while post-processing can leverage annotation guidelines, corpus statistics and logic inferred from the gold standard data. We also show how data imbalance in a training set affects performance. Evaluation of this approach on the test data gave an F-Score of 91.7%, one percent behind the top performing system.
Keywords: Clinical natural language processing; Information extraction; Text mining.
Copyright © 2015 Elsevier Inc. All rights reserved.
Figures








Similar articles
-
Using local lexicalized rules to identify heart disease risk factors in clinical notes.J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S183-S188. doi: 10.1016/j.jbi.2015.06.013. Epub 2015 Jun 29. J Biomed Inform. 2015. PMID: 26133479 Free PMC article.
-
Combining glass box and black box evaluations in the identification of heart disease risk factors and their temporal relations from clinical records.J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S133-S142. doi: 10.1016/j.jbi.2015.06.014. Epub 2015 Jul 2. J Biomed Inform. 2015. PMID: 26142870 Free PMC article.
-
Risk factor detection for heart disease by applying text analytics in electronic medical records.J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S164-S170. doi: 10.1016/j.jbi.2015.08.011. Epub 2015 Aug 14. J Biomed Inform. 2015. PMID: 26279500 Free PMC article.
-
Identifying risk factors for heart disease over time: Overview of 2014 i2b2/UTHealth shared task Track 2.J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S67-S77. doi: 10.1016/j.jbi.2015.07.001. Epub 2015 Jul 22. J Biomed Inform. 2015. PMID: 26210362 Free PMC article. Review.
-
Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1.J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S11-S19. doi: 10.1016/j.jbi.2015.06.007. Epub 2015 Jul 28. J Biomed Inform. 2015. PMID: 26225918 Free PMC article. Review.
Cited by
-
Named Entity Recognition in Electronic Health Records: A Methodological Review.Healthc Inform Res. 2023 Oct;29(4):286-300. doi: 10.4258/hir.2023.29.4.286. Epub 2023 Oct 31. Healthc Inform Res. 2023. PMID: 37964451 Free PMC article.
-
Practical applications for natural language processing in clinical research: The 2014 i2b2/UTHealth shared tasks.J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S1-S5. doi: 10.1016/j.jbi.2015.10.007. Epub 2015 Oct 24. J Biomed Inform. 2015. PMID: 26515500 Free PMC article.
-
Identifying Cases of Shoulder Injury Related to Vaccine Administration (SIRVA) in the United States: Development and Validation of a Natural Language Processing Method.JMIR Public Health Surveill. 2022 May 24;8(5):e30426. doi: 10.2196/30426. JMIR Public Health Surveill. 2022. PMID: 35608886 Free PMC article.
-
Text mining for case report articles on "peritoneal dialysis" from PubMed database.Ther Apher Dial. 2025 Jun;29(3):459-470. doi: 10.1111/1744-9987.70013. Epub 2025 Mar 26. Ther Apher Dial. 2025. PMID: 40143459 Free PMC article.
-
Heart disease risk factors detection from electronic health records using advanced NLP and deep learning techniques.Sci Rep. 2023 May 3;13(1):7173. doi: 10.1038/s41598-023-34294-6. Sci Rep. 2023. PMID: 37138014 Free PMC article.
References
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical