. 2017 Nov 1;24(6):1062-1071.

doi: 10.1093/jamia/ocx019.

EliIE: An open-source information extraction system for clinical trial eligibility criteria

Tian Kang¹, Shaodian Zhang¹, Youlan Tang², Gregory W Hruby¹, Alexander Rusanov¹, Noémie Elhadad¹, Chunhua Weng¹

Affiliations

¹ Department of Biomedical Informatics, Columbia University, New York, NY, USA.
² Institute of Human Nutrition, Columbia University, New York, NY, USA.

PMID: 28379377
PMCID: PMC6259668
DOI: 10.1093/jamia/ocx019

EliIE: An open-source information extraction system for clinical trial eligibility criteria

Tian Kang et al. J Am Med Inform Assoc. 2017.

. 2017 Nov 1;24(6):1062-1071.

doi: 10.1093/jamia/ocx019.

Authors

Tian Kang¹, Shaodian Zhang¹, Youlan Tang², Gregory W Hruby¹, Alexander Rusanov¹, Noémie Elhadad¹, Chunhua Weng¹

Affiliations

¹ Department of Biomedical Informatics, Columbia University, New York, NY, USA.
² Institute of Human Nutrition, Columbia University, New York, NY, USA.

PMID: 28379377
PMCID: PMC6259668
DOI: 10.1093/jamia/ocx019

Abstract

Objective: To develop an open-source information extraction system called Eligibility Criteria Information Extraction (EliIE) for parsing and formalizing free-text clinical research eligibility criteria (EC) following Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) version 5.0.

Materials and methods: EliIE parses EC in 4 steps: (1) clinical entity and attribute recognition, (2) negation detection, (3) relation extraction, and (4) concept normalization and output structuring. Informaticians and domain experts were recruited to design an annotation guideline and generate a training corpus of annotated EC for 230 Alzheimer's clinical trials, which were represented as queries against the OMOP CDM and included 8008 entities, 3550 attributes, and 3529 relations. A sequence labeling-based method was developed for automatic entity and attribute recognition. Negation detection was supported by NegEx and a set of predefined rules. Relation extraction was achieved by a support vector machine classifier. We further performed terminology-based concept normalization and output structuring.

Results: In task-specific evaluations, the best F1 score for entity recognition was 0.79, and for relation extraction was 0.89. The accuracy of negation detection was 0.94. The overall accuracy for query formalization was 0.71 in an end-to-end evaluation.

Conclusions: This study presents EliIE, an OMOP CDM-based information extraction system for automatic structuring and formalization of free-text EC. According to our evaluation, machine learning-based EliIE outperforms existing systems and shows promise to improve.

Keywords: clinical trials; common data model; machine learning; named entity recognition; natural language processing; patient selection.

Published by Oxford University Press on behalf of the American Medical Informatics Association 2017. This work is written by US Government employees and is in the public domain in the United States.

PubMed Disclaimer

Figures

**Figure 1.**
Example annotations using the Brat tool (http://brat.nlplab.org). Entities and attributes from different classes are distinguished by colors; relations are annotated as arrows between entities and attributes. It is able to generate structured annotation result files using Brat.

**Figure 2.**
General workflow of EliIE. It includes a filtering step and 4-phase parsing. The final outputs are stored in an XML file.

**Figure 3.**
Detailed feature description for entity/attribute recognition and relation extraction.

**Figure 4.**
Learning curves for recognition tasks by different sizes of training sizes. The graph on the top describes the learning curves from exact matching evaluation, while the other is partial matching evaluation. Both results show that when the number of the training data is over 150, the performance reaches stable status. In the last version of revision, here the legend F-score should be F1-score.

**Figure 5.**
Example results from ELIIE, i2b2-based CliNER, and EliXR (EliXR output format: *identified UMLS CUI* {*concept*; *Negation*; *Uncertain*; *Temporal*; *Measurement*; *Dosage*}).

See this image and copyright information in PMC

Cited by

Optimizing Clinical Trial Eligibility Design Using Natural Language Processing Models and Real-World Data: Algorithm Development and Validation.
Lee K, Liu Z, Mai Y, Jun T, Ma M, Wang T, Ai L, Calay E, Oh W, Stolovitzky G, Schadt E, Wang X. Lee K, et al. JMIR AI. 2024 Jul 29;3:e50800. doi: 10.2196/50800. JMIR AI. 2024. PMID: 39073872 Free PMC article.
Molecular-based precision oncology clinical decision making augmented by artificial intelligence.
Zeng J, Shufean MA. Zeng J, et al. Emerg Top Life Sci. 2021 Dec 21;5(6):757-764. doi: 10.1042/ETLS20210220. Emerg Top Life Sci. 2021. PMID: 34874054 Free PMC article. Review.
An OMOP CDM-Based Relational Database of Clinical Research Eligibility Criteria.
Si Y, Weng C. Si Y, et al. Stud Health Technol Inform. 2017;245:950-954. Stud Health Technol Inform. 2017. PMID: 29295240 Free PMC article.
CriteriaMapper: establishing the automatic identification of clinical trial cohorts from electronic health records by matching normalized eligibility criteria and patient clinical characteristics.
Lee K, Mai Y, Liu Z, Raja K, Jun T, Ma M, Wang T, Ai L, Calay E, Oh W, Schadt E, Wang X. Lee K, et al. Sci Rep. 2024 Oct 25;14(1):25387. doi: 10.1038/s41598-024-77447-x. Sci Rep. 2024. PMID: 39455879 Free PMC article.
Criteria2Query: a natural language interface to clinical databases for cohort definition.
Yuan C, Ryan PB, Ta C, Guo Y, Li Z, Hardin J, Makadia R, Jin P, Shang N, Kang T, Weng C. Yuan C, et al. J Am Med Inform Assoc. 2019 Apr 1;26(4):294-305. doi: 10.1093/jamia/ocy178. J Am Med Inform Assoc. 2019. PMID: 30753493 Free PMC article.

See all "Cited by" articles

References

1. Lovato LC, Hill K, Hertert S, Hunninghake DB, Probstfield JL. Recruitment for controlled clinical trials: literature summary and annotated bibliography. Controlled Clinical Trials. 1997;184:328–52. - PubMed
1. McDonald AM, Knight RC, Campbell MK. et al. What influences recruitment to randomised controlled trials? A review of trials funded by two UK funding agencies. Trials. 2006;71:9. - PMC - PubMed
1. Weng C, Tu SW, Sim I, Richesson R. Formal representation of eligibility criteria: a literature review. J Biomed Inform. 2010;433:451–67. - PMC - PubMed
1. Weng C, Yaman A, Lin K, He Z. Trend and network analysis of common eligibility features for cancer trials in ClinicalTrials.gov. Smart Health. 2014;8549:130–41. - PMC - PubMed
1. He Z, Carini S, Sim I, Weng C. Visual aggregate analysis of eligibility features of clinical trials. J Biomed Inform. 2015;54:241–55. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

R01 LM009886/LM/NLM NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Medical
- MedlinePlus Health Information
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

EliIE: An open-source information extraction system for clinical trial eligibility criteria

Affiliations

EliIE: An open-source information extraction system for clinical trial eligibility criteria

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Miscellaneous