. 2019 Apr 1;26(4):294-305.

doi: 10.1093/jamia/ocy178.

Criteria2Query: a natural language interface to clinical databases for cohort definition

Chi Yuan^{1

2}, Patrick B Ryan^{1

3}, Casey Ta¹, Yixuan Guo¹, Ziran Li¹, Jill Hardin³, Rupa Makadia³, Peng Jin¹, Ning Shang¹, Tian Kang¹, Chunhua Weng¹

Affiliations

¹ Department of Biomedical Informatics, Columbia University, New York, New York, USA.
² Department of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing, Jiangsu Province, P.R. China.
³ Epidemiology Analytics, Janssen Research and Development, Titusville, New Jersey, USA.

PMID: 30753493
PMCID: PMC6402359
DOI: 10.1093/jamia/ocy178

Criteria2Query: a natural language interface to clinical databases for cohort definition

Chi Yuan et al. J Am Med Inform Assoc. 2019.

. 2019 Apr 1;26(4):294-305.

doi: 10.1093/jamia/ocy178.

Authors

Chi Yuan^{1

2}, Patrick B Ryan^{1

3}, Casey Ta¹, Yixuan Guo¹, Ziran Li¹, Jill Hardin³, Rupa Makadia³, Peng Jin¹, Ning Shang¹, Tian Kang¹, Chunhua Weng¹

Affiliations

¹ Department of Biomedical Informatics, Columbia University, New York, New York, USA.
² Department of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing, Jiangsu Province, P.R. China.
³ Epidemiology Analytics, Janssen Research and Development, Titusville, New Jersey, USA.

PMID: 30753493
PMCID: PMC6402359
DOI: 10.1093/jamia/ocy178

Abstract

Objective: Cohort definition is a bottleneck for conducting clinical research and depends on subjective decisions by domain experts. Data-driven cohort definition is appealing but requires substantial knowledge of terminologies and clinical data models. Criteria2Query is a natural language interface that facilitates human-computer collaboration for cohort definition and execution using clinical databases.

Materials and methods: Criteria2Query uses a hybrid information extraction pipeline combining machine learning and rule-based methods to systematically parse eligibility criteria text, transforms it first into a structured criteria representation and next into sharable and executable clinical data queries represented as SQL queries conforming to the OMOP Common Data Model. Users can interactively review, refine, and execute queries in the ATLAS web application. To test effectiveness, we evaluated 125 criteria across different disease domains from ClinicalTrials.gov and 52 user-entered criteria. We evaluated F1 score and accuracy against 2 domain experts and calculated the average computation time for fully automated query formulation. We conducted an anonymous survey evaluating usability.

Results: Criteria2Query achieved 0.795 and 0.805 F1 score for entity recognition and relation extraction, respectively. Accuracies for negation detection, logic detection, entity normalization, and attribute normalization were 0.984, 0.864, 0.514 and 0.793, respectively. Fully automatic query formulation took 1.22 seconds/criterion. More than 80% (11+ of 13) of users would use Criteria2Query in their future cohort definition tasks.

Conclusions: We contribute a novel natural language interface to clinical databases. It is open source and supports fully automated and interactive modes for autonomous data-driven cohort definition by researchers with minimal human effort. We demonstrate its promising user friendliness and usability.

Keywords: cohort definition; common data model; natural language interfaces to database; natural language processing.

PubMed Disclaimer

Figures

**Figure 1.**
System architecture and data flow of Criteria2Query.

**Figure 2.**
An example of one criterion on ATLAS.

**Figure 3.**
Concept set autogeneration process. AD: Alzheimer’s disease; ICD10 : International Classification of Diseases–Tenth Revision; ICD9CM: International Classification of Diseases–Ninth Revision–Clinical Modification; N: no; Y: yes.

**Figure 4.**
User workflow of Criteria2Query.

**Figure 5.**
The user interface of the Criteria2Query system.

**Figure 6.**
Automatically generated cohort query presented by ATLAS to allow query review, refinement, and execution for patient cohort generation using clinical databases.

See this image and copyright information in PMC

Cited by

Named Entity Recognition and Normalization for Alzheimer's Disease Eligibility Criteria.
Sun Z, Tao C. Sun Z, et al. Proc (IEEE Int Conf Healthc Inform). 2023 Jun;2023:558-564. doi: 10.1109/ichi57859.2023.00100. Epub 2023 Dec 11. Proc (IEEE Int Conf Healthc Inform). 2023. PMID: 38283164 Free PMC article.
How can natural language processing help model informed drug development?: a review.
Bhatnagar R, Sardar S, Beheshti M, Podichetty JT. Bhatnagar R, et al. JAMIA Open. 2022 Jun 11;5(2):ooac043. doi: 10.1093/jamiaopen/ooac043. eCollection 2022 Jul. JAMIA Open. 2022. PMID: 35702625 Free PMC article. Review.
The IMPACT framework and implementation for accessible in silico clinical phenotyping in the digital era.
Wen A, He H, Fu S, Liu S, Miller K, Wang L, Roberts KE, Bedrick SD, Hersh WR, Liu H. Wen A, et al. NPJ Digit Med. 2023 Jul 21;6(1):132. doi: 10.1038/s41746-023-00878-9. NPJ Digit Med. 2023. PMID: 37479735 Free PMC article. Review.
A guide to artificial intelligence for cancer researchers.
Perez-Lopez R, Ghaffari Laleh N, Mahmood F, Kather JN. Perez-Lopez R, et al. Nat Rev Cancer. 2024 Jun;24(6):427-441. doi: 10.1038/s41568-024-00694-7. Epub 2024 May 16. Nat Rev Cancer. 2024. PMID: 38755439 Review.
Prototypical Clinical Trial Registry Based on Fast Healthcare Interoperability Resources (FHIR): Design and Implementation Study.
Gulden C, Blasini R, Nassirian A, Stein A, Altun FB, Kirchner M, Prokosch HU, Boeker M. Gulden C, et al. JMIR Med Inform. 2021 Jan 12;9(1):e20470. doi: 10.2196/20470. JMIR Med Inform. 2021. PMID: 33433393 Free PMC article.

See all "Cited by" articles

References

1. Häyrinen K, Saranto K, Nykänen P.. Definition, structure, content, use and impacts of electronic health records: a review of the research literature. Int J Med Inf 2008; 77: 291–304. - PubMed
1. Penberthy L, Brown R, Puma F, et al. Automated matching software for clinical trials eligibility: measuring efficiency and flexibility. Contemp Clin Trials 2010; 31 (3): 207–17. - PMC - PubMed
1. Thadani SR, Weng C, Bigger JT, et al. Electronic screening improves efficiency in clinical trial recruitment. J Am Med Inform Assoc 2009; 16 (6): 869–73. - PMC - PubMed
1. Penberthy LT, Dahman BA, Petkov VI, et al. Effort required in eligibility screening for clinical trials. J Oncol Pract 2012; 8 (6): 365–70. - PMC - PubMed
1. Musen MA, Rohn JA, Fagan LM, et al. Knowledge engineering for a clinical trial advice system: uncovering errors in protocol specification. Bull Cancer 1987; 74: 291–6. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Criteria2Query: a natural language interface to clinical databases for cohort definition

Affiliations

Criteria2Query: a natural language interface to clinical databases for cohort definition

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical