Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Feb 16;30(3):427-437.
doi: 10.1093/jamia/ocac235.

Characterizing variability of electronic health record-driven phenotype definitions

Affiliations

Characterizing variability of electronic health record-driven phenotype definitions

Pascal S Brandt et al. J Am Med Inform Assoc. .

Abstract

Objective: The aim of this study was to analyze a publicly available sample of rule-based phenotype definitions to characterize and evaluate the variability of logical constructs used.

Materials and methods: A sample of 33 preexisting phenotype definitions used in research that are represented using Fast Healthcare Interoperability Resources and Clinical Quality Language (CQL) was analyzed using automated analysis of the computable representation of the CQL libraries.

Results: Most of the phenotype definitions include narrative descriptions and flowcharts, while few provide pseudocode or executable artifacts. Most use 4 or fewer medical terminologies. The number of codes used ranges from 5 to 6865, and value sets from 1 to 19. We found that the most common expressions used were literal, data, and logical expressions. Aggregate and arithmetic expressions are the least common. Expression depth ranges from 4 to 27.

Discussion: Despite the range of conditions, we found that all of the phenotype definitions consisted of logical criteria, representing both clinical and operational logic, and tabular data, consisting of codes from standard terminologies and keywords for natural language processing. The total number and variety of expressions are low, which may be to simplify implementation, or authors may limit complexity due to data availability constraints.

Conclusions: The phenotype definitions analyzed show significant variation in specific logical, arithmetic, and other operators but are all composed of the same high-level components, namely tabular data and logical expressions. A standard representation for phenotype definitions should support these formats and be modular to support localization and shared logic.

Keywords: CQL; EHR-driven phenotyping; FHIR; cohort identification.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Histograms of medical vocabulary code and code system usage. (A) The number of phenotype definitions using a number of code systems. (B) The number of phenotype definitions using a specific code system. (C) The number of distinct vocabulary codes used for a given code system for all phenotype definitions. (D) The number of phenotype definitions using a number of distinct codes. AMT: Australian Medicines Terminology; BDPM: Public Database of Medications; CIEL: Columbia International eHealth Laboratory; CPT: Current Procedural Terminology; dm + d: Dictionary of Medicines and Devices; ICD-9-CM: International Classification of Diseases, Ninth Revision, Clinical Modification; ICD-10-CM: International Classification of Diseases, Tenth Revision, Clinical Modification; ICD-9-Proc: International Classification of Diseases, Ninth Revision, Procedures; ICD-10-PCS: International Classification of Diseases, Tenth Revision, Procedure Coding System; HCPCS: Healthcare Common Procedure Coding System; LOINC: Logical Observation Identifiers Names and Codes; MedDRA: Medical Dictionary for Regulatory Activities; MeSH: Medical Subject Headings; SNOMED: Systematized Nomenclature of Medicine.
Figure 2.
Figure 2.
Cumulative CQL expression counts per category. CQL: Clinical Quality Language.
Figure 3.
Figure 3.
Total numbers of individual CQL expression types by category (excluding literal expressions). CQL: Clinical Quality Language.
Figure 4.
Figure 4.
Number of phenotype definitions utilizing various data types within literal expressions.
Figure 5.
Figure 5.
Data retrieval expressions and data types. (A) The number of phenotype definitions using a number of distinct data types. (B) The number of phenotype definitions using specific data types. (C) The number of phenotype definitions using ranges of retrieve statements.
Figure 6.
Figure 6.
Example of expression depth (A), along with histograms of total (B) and where clause (C) expression depths.

References

    1. Banda JM, Seneviratne M, Hernandez-Boussard T, et al. Advances in electronic phenotyping: from rule-based definitions to machine learning models. Annu Rev Biomed Data Sci 2018; 1: 53–68. - PMC - PubMed
    1. Fleurence RL, Curtis LH, Califf RM, et al. Launching PCORnet, a national patient-centered clinical research network. J Am Med Inform Assoc 2014; 21 (4): 578–82. - PMC - PubMed
    1. McCarty CA, Chisholm RL, Chute CG, et al. ; eMERGE Team. The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies. BMC Med Genomics 2011; 4: 13. - PMC - PubMed
    1. Gottesman O, Kuivaniemi H, Tromp G, et al. ; eMERGE Network. The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future. Genet Med 2013; 15 (10): 761–71. - PMC - PubMed
    1. eMERGE Consortium. Harmonizing clinical sequencing and interpretation for the eMERGE III network. Am J Hum Genet 2019; 105: 588–605. - PMC - PubMed

Publication types