Hybrid bag of approaches to characterize selection criteria for cohort identification
- PMID: 31197354
- PMCID: PMC7647216
- DOI: 10.1093/jamia/ocz079
Hybrid bag of approaches to characterize selection criteria for cohort identification
Abstract
Objective: The 2018 National NLP Clinical Challenge (2018 n2c2) focused on the task of cohort selection for clinical trials, where participating systems were tasked with analyzing longitudinal patient records to determine if the patients met or did not meet any of the 13 selection criteria. This article describes our participation in this shared task.
Materials and methods: We followed a hybrid approach combining pattern-based, knowledge-intensive, and feature weighting techniques. After preprocessing the notes using publicly available natural language processing tools, we developed individual criterion-specific components that relied on collecting knowledge resources relevant for these criteria and pattern-based and weighting approaches to identify "met" and "not met" cases.
Results: As part of the 2018 n2c2 challenge, 3 runs were submitted. The overall micro-averaged F1 on the training set was 0.9444. On the test set, the micro-averaged F1 for the 3 submitted runs were 0.9075, 0.9065, and 0.9056. The best run was placed second in the overall challenge and all 3 runs were statistically similar to the top-ranked system. A reimplemented system achieved the best overall F1 of 0.9111 on the test set.
Discussion: We highlight the need for a focused resource-intensive effort to address the class imbalance in the cohort selection identification task.
Conclusion: Our hybrid approach was able to identify all selection criteria with high F1 performance on both training and test sets. Based on our participation in the 2018 n2c2 task, we conclude that there is merit in continuing a focused criterion-specific analysis and developing appropriate knowledge resources to build a quality cohort selection system.
Keywords: clinical trial selection criteria; cohort identification; information storage and retrieval [L01.313.500.750.280]; information systems [L01.313.500.750.300]; natural language processing (L01.224.065.580).
© The Author(s) 2019. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Comment in
-
New approaches to cohort selection.J Am Med Inform Assoc. 2019 Nov 1;26(11):1161-1162. doi: 10.1093/jamia/ocz174. J Am Med Inform Assoc. 2019. PMID: 31613362 Free PMC article. No abstract available.
Similar articles
-
Cohort selection for clinical trials using hierarchical neural network.J Am Med Inform Assoc. 2019 Nov 1;26(11):1203-1208. doi: 10.1093/jamia/ocz099. J Am Med Inform Assoc. 2019. PMID: 31305921 Free PMC article.
-
Clinical trial cohort selection based on multi-level rule-based natural language processing system.J Am Med Inform Assoc. 2019 Nov 1;26(11):1218-1226. doi: 10.1093/jamia/ocz109. J Am Med Inform Assoc. 2019. PMID: 31300825 Free PMC article.
-
Evaluating shallow and deep learning strategies for the 2018 n2c2 shared task on clinical text classification.J Am Med Inform Assoc. 2019 Nov 1;26(11):1247-1254. doi: 10.1093/jamia/ocz149. J Am Med Inform Assoc. 2019. PMID: 31512729 Free PMC article.
-
Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1.J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S11-S19. doi: 10.1016/j.jbi.2015.06.007. Epub 2015 Jul 28. J Biomed Inform. 2015. PMID: 26225918 Free PMC article. Review.
-
Machine learning and natural language processing in clinical trial eligibility criteria parsing: a scoping review.Drug Discov Today. 2024 Oct;29(10):104139. doi: 10.1016/j.drudis.2024.104139. Epub 2024 Aug 19. Drug Discov Today. 2024. PMID: 39154773
Cited by
-
Identifying Caregiver Availability Using Medical Notes With Rule-Based Natural Language Processing: Retrospective Cohort Study.JMIR Aging. 2022 Sep 22;5(3):e40241. doi: 10.2196/40241. JMIR Aging. 2022. PMID: 35998328 Free PMC article.
-
Cohort selection for clinical trials: n2c2 2018 shared task track 1.J Am Med Inform Assoc. 2019 Nov 1;26(11):1163-1171. doi: 10.1093/jamia/ocz163. J Am Med Inform Assoc. 2019. PMID: 31562516 Free PMC article.
-
Evaluation of Doc'EDS: a French semantic search tool to query health documents from a clinical data warehouse.BMC Med Inform Decis Mak. 2022 Feb 8;22(1):34. doi: 10.1186/s12911-022-01762-4. BMC Med Inform Decis Mak. 2022. PMID: 35135538 Free PMC article.
-
New approaches to cohort selection.J Am Med Inform Assoc. 2019 Nov 1;26(11):1161-1162. doi: 10.1093/jamia/ocz174. J Am Med Inform Assoc. 2019. PMID: 31613362 Free PMC article. No abstract available.
-
Cohort Selection for Clinical Trials From Longitudinal Patient Records: Text Mining Approach.JMIR Med Inform. 2019 Oct 31;7(4):e15980. doi: 10.2196/15980. JMIR Med Inform. 2019. PMID: 31674914 Free PMC article.
References
-
- Stubbs A, et al. A methodology for using professional knowledge in corpus annotation. 2013. ProQuest Dissertations and Theses, Brandeis University. https://search.library.brandeis.edu/primo-explore/fulldisplay?docid=TN_p....
-
- Uzuner Ö, Stubbs A, Filannino M, et al. National NLP clinical challenge (n2c2) 2018. shared task and workshop, track 1: cohort selection for clinical trials. https://n2c2.dbmi.hms.harvard.edu/track1 Accessed January 16, 2019. - PMC - PubMed
-
- Meystre SM, Savova GK, Kipper-Schuler KC, Hurdle JF.. Extracting information from textual documents in the electronic health record: a review of recent research. Yearb Med Inform 2008; 171: 128–44. - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Medical
Research Materials
Miscellaneous