. 2023 Feb 16;30(3):427-437.

doi: 10.1093/jamia/ocac235.

Characterizing variability of electronic health record-driven phenotype definitions

Pascal S Brandt¹, Abel Kho², Yuan Luo², Jennifer A Pacheco², Theresa L Walunas², Hakon Hakonarson³, George Hripcsak⁴, Cong Liu⁴, Ning Shang⁴, Chunhua Weng⁴, Nephi Walton⁵, David S Carrell⁶, Paul K Crane⁷, Eric B Larson^{7

8}, Christopher G Chute⁹, Iftikhar J Kullo¹⁰, Robert Carroll¹¹, Josh Denny¹², Andrea Ramirez¹¹, Wei-Qi Wei¹³, Jyoti Pathak¹⁴, Laura K Wiley¹⁵, Rachel Richesson¹⁶, Justin B Starren², Luke V Rasmussen²

Affiliations

¹ Department of Biomedical and Medical Education, University of Washington, Seattle, Washington, USA.
² Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA.
³ Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA.
⁴ Department of Biomedical Informatics, Columbia University, New York, New York, USA.
⁵ Intermountain Precision Genomics, Intermountain Healthcare, St George, Utah, USA.
⁶ Kaiser Permanente Washington Health Research Institute, Seattle, Washington, USA.
⁷ Department of Medicine, University of Washington, Seattle, Washington, USA.
⁸ Department of Health Services, University of Washington, Seattle, Washington, USA.
⁹ Schools of Medicine, Public Health, and Nursing, Johns Hopkins University, Baltimore, Maryland, USA.
¹⁰ Department of Cardiovascular Medicine, Mayo Clinic, Rochester, Minnesota, USA.
¹¹ Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA.
¹² All of Us Research Program, National Institutes of Health, Bethesda, Maryland, USA.
¹³ Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA.
¹⁴ Department of Population Health Sciences, Weill Cornell Medicine, New York, New York, USA.
¹⁵ Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA.
¹⁶ Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, Michigan, USA.

PMID: 36474423
PMCID: PMC9933077
DOI: 10.1093/jamia/ocac235

Characterizing variability of electronic health record-driven phenotype definitions

Pascal S Brandt et al. J Am Med Inform Assoc. 2023.

. 2023 Feb 16;30(3):427-437.

doi: 10.1093/jamia/ocac235.

Authors

Affiliations

¹ Department of Biomedical and Medical Education, University of Washington, Seattle, Washington, USA.
² Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA.
³ Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA.
⁴ Department of Biomedical Informatics, Columbia University, New York, New York, USA.
⁵ Intermountain Precision Genomics, Intermountain Healthcare, St George, Utah, USA.
⁶ Kaiser Permanente Washington Health Research Institute, Seattle, Washington, USA.
⁷ Department of Medicine, University of Washington, Seattle, Washington, USA.
⁸ Department of Health Services, University of Washington, Seattle, Washington, USA.
⁹ Schools of Medicine, Public Health, and Nursing, Johns Hopkins University, Baltimore, Maryland, USA.
¹⁰ Department of Cardiovascular Medicine, Mayo Clinic, Rochester, Minnesota, USA.
¹¹ Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA.
¹² All of Us Research Program, National Institutes of Health, Bethesda, Maryland, USA.
¹³ Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA.
¹⁴ Department of Population Health Sciences, Weill Cornell Medicine, New York, New York, USA.
¹⁵ Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA.
¹⁶ Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, Michigan, USA.

PMID: 36474423
PMCID: PMC9933077
DOI: 10.1093/jamia/ocac235

Abstract

Objective: The aim of this study was to analyze a publicly available sample of rule-based phenotype definitions to characterize and evaluate the variability of logical constructs used.

Materials and methods: A sample of 33 preexisting phenotype definitions used in research that are represented using Fast Healthcare Interoperability Resources and Clinical Quality Language (CQL) was analyzed using automated analysis of the computable representation of the CQL libraries.

Results: Most of the phenotype definitions include narrative descriptions and flowcharts, while few provide pseudocode or executable artifacts. Most use 4 or fewer medical terminologies. The number of codes used ranges from 5 to 6865, and value sets from 1 to 19. We found that the most common expressions used were literal, data, and logical expressions. Aggregate and arithmetic expressions are the least common. Expression depth ranges from 4 to 27.

Discussion: Despite the range of conditions, we found that all of the phenotype definitions consisted of logical criteria, representing both clinical and operational logic, and tabular data, consisting of codes from standard terminologies and keywords for natural language processing. The total number and variety of expressions are low, which may be to simplify implementation, or authors may limit complexity due to data availability constraints.

Conclusions: The phenotype definitions analyzed show significant variation in specific logical, arithmetic, and other operators but are all composed of the same high-level components, namely tabular data and logical expressions. A standard representation for phenotype definitions should support these formats and be modular to support localization and shared logic.

Keywords: CQL; EHR-driven phenotyping; FHIR; cohort identification.

PubMed Disclaimer

Figures

**Figure 1.**
Histograms of medical vocabulary code and code system usage. (A) The number of phenotype definitions using a number of code systems. (B) The number of phenotype definitions using a specific code system. (C) The number of distinct vocabulary codes used for a given code system for all phenotype definitions. (D) The number of phenotype definitions using a number of distinct codes. AMT: Australian Medicines Terminology; BDPM: Public Database of Medications; CIEL: Columbia International eHealth Laboratory; CPT: Current Procedural Terminology; dm + d: Dictionary of Medicines and Devices; ICD-9-CM: International Classification of Diseases, Ninth Revision, Clinical Modification; ICD-10-CM: International Classification of Diseases, Tenth Revision, Clinical Modification; ICD-9-Proc: International Classification of Diseases, Ninth Revision, Procedures; ICD-10-PCS: International Classification of Diseases, Tenth Revision, Procedure Coding System; HCPCS: Healthcare Common Procedure Coding System; LOINC: Logical Observation Identifiers Names and Codes; MedDRA: Medical Dictionary for Regulatory Activities; MeSH: Medical Subject Headings; SNOMED: Systematized Nomenclature of Medicine.

**Figure 2.**
Cumulative CQL expression counts per category. CQL: Clinical Quality Language.

**Figure 3.**
Total numbers of individual CQL expression types by category (excluding literal expressions). CQL: Clinical Quality Language.

**Figure 4.**
Number of phenotype definitions utilizing various data types within literal expressions.

**Figure 5.**
Data retrieval expressions and data types. (A) The number of phenotype definitions using a number of distinct data types. (B) The number of phenotype definitions using specific data types. (C) The number of phenotype definitions using ranges of retrieve statements.

**Figure 6.**
Example of expression depth (A), along with histograms of total (B) and **where** clause (C) expression depths.

See this image and copyright information in PMC

References

1. Banda JM, Seneviratne M, Hernandez-Boussard T, et al. Advances in electronic phenotyping: from rule-based definitions to machine learning models. Annu Rev Biomed Data Sci 2018; 1: 53–68. - PMC - PubMed
1. Fleurence RL, Curtis LH, Califf RM, et al. Launching PCORnet, a national patient-centered clinical research network. J Am Med Inform Assoc 2014; 21 (4): 578–82. - PMC - PubMed
1. McCarty CA, Chisholm RL, Chute CG, et al. ; eMERGE Team. The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies. BMC Med Genomics 2011; 4: 13. - PMC - PubMed
1. Gottesman O, Kuivaniemi H, Tromp G, et al. ; eMERGE Network. The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future. Genet Med 2013; 15 (10): 761–71. - PMC - PubMed
1. eMERGE Consortium. Harmonizing clinical sequencing and interpretation for the eMERGE III network. Am J Hum Genet 2019; 105: 588–605. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Characterizing variability of electronic health record-driven phenotype definitions

Affiliations

Characterizing variability of electronic health record-driven phenotype definitions

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources