EliIE: An open-source information extraction system for clinical trial eligibility criteria
- PMID: 28379377
- PMCID: PMC6259668
- DOI: 10.1093/jamia/ocx019
EliIE: An open-source information extraction system for clinical trial eligibility criteria
Abstract
Objective: To develop an open-source information extraction system called Eligibility Criteria Information Extraction (EliIE) for parsing and formalizing free-text clinical research eligibility criteria (EC) following Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) version 5.0.
Materials and methods: EliIE parses EC in 4 steps: (1) clinical entity and attribute recognition, (2) negation detection, (3) relation extraction, and (4) concept normalization and output structuring. Informaticians and domain experts were recruited to design an annotation guideline and generate a training corpus of annotated EC for 230 Alzheimer's clinical trials, which were represented as queries against the OMOP CDM and included 8008 entities, 3550 attributes, and 3529 relations. A sequence labeling-based method was developed for automatic entity and attribute recognition. Negation detection was supported by NegEx and a set of predefined rules. Relation extraction was achieved by a support vector machine classifier. We further performed terminology-based concept normalization and output structuring.
Results: In task-specific evaluations, the best F1 score for entity recognition was 0.79, and for relation extraction was 0.89. The accuracy of negation detection was 0.94. The overall accuracy for query formalization was 0.71 in an end-to-end evaluation.
Conclusions: This study presents EliIE, an OMOP CDM-based information extraction system for automatic structuring and formalization of free-text EC. According to our evaluation, machine learning-based EliIE outperforms existing systems and shows promise to improve.
Keywords: clinical trials; common data model; machine learning; named entity recognition; natural language processing; patient selection.
Published by Oxford University Press on behalf of the American Medical Informatics Association 2017. This work is written by US Government employees and is in the public domain in the United States.
Figures





Similar articles
-
Criteria2Query: a natural language interface to clinical databases for cohort definition.J Am Med Inform Assoc. 2019 Apr 1;26(4):294-305. doi: 10.1093/jamia/ocy178. J Am Med Inform Assoc. 2019. PMID: 30753493 Free PMC article.
-
Building an OMOP common data model-compliant annotated corpus for COVID-19 clinical trials.J Biomed Inform. 2021 Jun;118:103790. doi: 10.1016/j.jbi.2021.103790. Epub 2021 Apr 28. J Biomed Inform. 2021. PMID: 33887457 Free PMC article.
-
Combining human and machine intelligence for clinical trial eligibility querying.J Am Med Inform Assoc. 2022 Jun 14;29(7):1161-1171. doi: 10.1093/jamia/ocac051. J Am Med Inform Assoc. 2022. PMID: 35426943 Free PMC article.
-
Machine learning and natural language processing in clinical trial eligibility criteria parsing: a scoping review.Drug Discov Today. 2024 Oct;29(10):104139. doi: 10.1016/j.drudis.2024.104139. Epub 2024 Aug 19. Drug Discov Today. 2024. PMID: 39154773
-
Evaluation of a prototype machine learning tool to semi-automate data extraction for systematic literature reviews.Syst Rev. 2023 Oct 6;12(1):187. doi: 10.1186/s13643-023-02351-w. Syst Rev. 2023. PMID: 37803451 Free PMC article.
Cited by
-
Optimizing Clinical Trial Eligibility Design Using Natural Language Processing Models and Real-World Data: Algorithm Development and Validation.JMIR AI. 2024 Jul 29;3:e50800. doi: 10.2196/50800. JMIR AI. 2024. PMID: 39073872 Free PMC article.
-
Molecular-based precision oncology clinical decision making augmented by artificial intelligence.Emerg Top Life Sci. 2021 Dec 21;5(6):757-764. doi: 10.1042/ETLS20210220. Emerg Top Life Sci. 2021. PMID: 34874054 Free PMC article. Review.
-
An OMOP CDM-Based Relational Database of Clinical Research Eligibility Criteria.Stud Health Technol Inform. 2017;245:950-954. Stud Health Technol Inform. 2017. PMID: 29295240 Free PMC article.
-
CriteriaMapper: establishing the automatic identification of clinical trial cohorts from electronic health records by matching normalized eligibility criteria and patient clinical characteristics.Sci Rep. 2024 Oct 25;14(1):25387. doi: 10.1038/s41598-024-77447-x. Sci Rep. 2024. PMID: 39455879 Free PMC article.
-
Criteria2Query: a natural language interface to clinical databases for cohort definition.J Am Med Inform Assoc. 2019 Apr 1;26(4):294-305. doi: 10.1093/jamia/ocy178. J Am Med Inform Assoc. 2019. PMID: 30753493 Free PMC article.
References
-
- Lovato LC, Hill K, Hertert S, Hunninghake DB, Probstfield JL. Recruitment for controlled clinical trials: literature summary and annotated bibliography. Controlled Clinical Trials. 1997;184:328–52. - PubMed
-
- Weng C, Yaman A, Lin K, He Z. Trend and network analysis of common eligibility features for cancer trials in ClinicalTrials.gov. Smart Health. 2014;8549:130–41. - PMC - PubMed
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical
Miscellaneous