Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Nov 1;24(6):1062-1071.
doi: 10.1093/jamia/ocx019.

EliIE: An open-source information extraction system for clinical trial eligibility criteria

Affiliations

EliIE: An open-source information extraction system for clinical trial eligibility criteria

Tian Kang et al. J Am Med Inform Assoc. .

Abstract

Objective: To develop an open-source information extraction system called Eligibility Criteria Information Extraction (EliIE) for parsing and formalizing free-text clinical research eligibility criteria (EC) following Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) version 5.0.

Materials and methods: EliIE parses EC in 4 steps: (1) clinical entity and attribute recognition, (2) negation detection, (3) relation extraction, and (4) concept normalization and output structuring. Informaticians and domain experts were recruited to design an annotation guideline and generate a training corpus of annotated EC for 230 Alzheimer's clinical trials, which were represented as queries against the OMOP CDM and included 8008 entities, 3550 attributes, and 3529 relations. A sequence labeling-based method was developed for automatic entity and attribute recognition. Negation detection was supported by NegEx and a set of predefined rules. Relation extraction was achieved by a support vector machine classifier. We further performed terminology-based concept normalization and output structuring.

Results: In task-specific evaluations, the best F1 score for entity recognition was 0.79, and for relation extraction was 0.89. The accuracy of negation detection was 0.94. The overall accuracy for query formalization was 0.71 in an end-to-end evaluation.

Conclusions: This study presents EliIE, an OMOP CDM-based information extraction system for automatic structuring and formalization of free-text EC. According to our evaluation, machine learning-based EliIE outperforms existing systems and shows promise to improve.

Keywords: clinical trials; common data model; machine learning; named entity recognition; natural language processing; patient selection.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Example annotations using the Brat tool (http://brat.nlplab.org). Entities and attributes from different classes are distinguished by colors; relations are annotated as arrows between entities and attributes. It is able to generate structured annotation result files using Brat.
Figure 2.
Figure 2.
General workflow of EliIE. It includes a filtering step and 4-phase parsing. The final outputs are stored in an XML file.
Figure 3.
Figure 3.
Detailed feature description for entity/attribute recognition and relation extraction.
Figure 4.
Figure 4.
Learning curves for recognition tasks by different sizes of training sizes. The graph on the top describes the learning curves from exact matching evaluation, while the other is partial matching evaluation. Both results show that when the number of the training data is over 150, the performance reaches stable status. In the last version of revision, here the legend F-score should be F1-score.
Figure 5.
Figure 5.
Example results from ELIIE, i2b2-based CliNER, and EliXR (EliXR output format: identified UMLS CUI {concept; Negation; Uncertain; Temporal; Measurement; Dosage}).

Similar articles

Cited by

References

    1. Lovato LC, Hill K, Hertert S, Hunninghake DB, Probstfield JL. Recruitment for controlled clinical trials: literature summary and annotated bibliography. Controlled Clinical Trials. 1997;184:328–52. - PubMed
    1. McDonald AM, Knight RC, Campbell MK. et al. What influences recruitment to randomised controlled trials? A review of trials funded by two UK funding agencies. Trials. 2006;71:9. - PMC - PubMed
    1. Weng C, Tu SW, Sim I, Richesson R. Formal representation of eligibility criteria: a literature review. J Biomed Inform. 2010;433:451–67. - PMC - PubMed
    1. Weng C, Yaman A, Lin K, He Z. Trend and network analysis of common eligibility features for cancer trials in ClinicalTrials.gov. Smart Health. 2014;8549:130–41. - PMC - PubMed
    1. He Z, Carini S, Sim I, Weng C. Visual aggregate analysis of eligibility features of clinical trials. J Biomed Inform. 2015;54:241–55. - PMC - PubMed