Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Apr 1;26(4):294-305.
doi: 10.1093/jamia/ocy178.

Criteria2Query: a natural language interface to clinical databases for cohort definition

Affiliations

Criteria2Query: a natural language interface to clinical databases for cohort definition

Chi Yuan et al. J Am Med Inform Assoc. .

Abstract

Objective: Cohort definition is a bottleneck for conducting clinical research and depends on subjective decisions by domain experts. Data-driven cohort definition is appealing but requires substantial knowledge of terminologies and clinical data models. Criteria2Query is a natural language interface that facilitates human-computer collaboration for cohort definition and execution using clinical databases.

Materials and methods: Criteria2Query uses a hybrid information extraction pipeline combining machine learning and rule-based methods to systematically parse eligibility criteria text, transforms it first into a structured criteria representation and next into sharable and executable clinical data queries represented as SQL queries conforming to the OMOP Common Data Model. Users can interactively review, refine, and execute queries in the ATLAS web application. To test effectiveness, we evaluated 125 criteria across different disease domains from ClinicalTrials.gov and 52 user-entered criteria. We evaluated F1 score and accuracy against 2 domain experts and calculated the average computation time for fully automated query formulation. We conducted an anonymous survey evaluating usability.

Results: Criteria2Query achieved 0.795 and 0.805 F1 score for entity recognition and relation extraction, respectively. Accuracies for negation detection, logic detection, entity normalization, and attribute normalization were 0.984, 0.864, 0.514 and 0.793, respectively. Fully automatic query formulation took 1.22 seconds/criterion. More than 80% (11+ of 13) of users would use Criteria2Query in their future cohort definition tasks.

Conclusions: We contribute a novel natural language interface to clinical databases. It is open source and supports fully automated and interactive modes for autonomous data-driven cohort definition by researchers with minimal human effort. We demonstrate its promising user friendliness and usability.

Keywords: cohort definition; common data model; natural language interfaces to database; natural language processing.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
System architecture and data flow of Criteria2Query.
Figure 2.
Figure 2.
An example of one criterion on ATLAS.
Figure 3.
Figure 3.
Concept set autogeneration process. AD: Alzheimer’s disease; ICD10 : International Classification of Diseases–Tenth Revision; ICD9CM: International Classification of Diseases–Ninth Revision–Clinical Modification; N: no; Y: yes.
Figure 4.
Figure 4.
User workflow of Criteria2Query.
Figure 5.
Figure 5.
The user interface of the Criteria2Query system.
Figure 6.
Figure 6.
Automatically generated cohort query presented by ATLAS to allow query review, refinement, and execution for patient cohort generation using clinical databases.

Similar articles

Cited by

References

    1. Häyrinen K, Saranto K, Nykänen P.. Definition, structure, content, use and impacts of electronic health records: a review of the research literature. Int J Med Inf 2008; 77: 291–304. - PubMed
    1. Penberthy L, Brown R, Puma F, et al. Automated matching software for clinical trials eligibility: measuring efficiency and flexibility. Contemp Clin Trials 2010; 31 (3): 207–17. - PMC - PubMed
    1. Thadani SR, Weng C, Bigger JT, et al. Electronic screening improves efficiency in clinical trial recruitment. J Am Med Inform Assoc 2009; 16 (6): 869–73. - PMC - PubMed
    1. Penberthy LT, Dahman BA, Petkov VI, et al. Effort required in eligibility screening for clinical trials. J Oncol Pract 2012; 8 (6): 365–70. - PMC - PubMed
    1. Musen MA, Rohn JA, Fagan LM, et al. Knowledge engineering for a clinical trial advice system: uncovering errors in protocol specification. Bull Cancer 1987; 74: 291–6. - PubMed

Publication types