Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Aug 1;31(8):1785-1796.
doi: 10.1093/jamia/ocae121.

A general framework for developing computable clinical phenotype algorithms

Affiliations

A general framework for developing computable clinical phenotype algorithms

David S Carrell et al. J Am Med Inform Assoc. .

Abstract

Objective: To present a general framework providing high-level guidance to developers of computable algorithms for identifying patients with specific clinical conditions (phenotypes) through a variety of approaches, including but not limited to machine learning and natural language processing methods to incorporate rich electronic health record data.

Materials and methods: Drawing on extensive prior phenotyping experiences and insights derived from 3 algorithm development projects conducted specifically for this purpose, our team with expertise in clinical medicine, statistics, informatics, pharmacoepidemiology, and healthcare data science methods conceptualized stages of development and corresponding sets of principles, strategies, and practical guidelines for improving the algorithm development process.

Results: We propose 5 stages of algorithm development and corresponding principles, strategies, and guidelines: (1) assessing fitness-for-purpose, (2) creating gold standard data, (3) feature engineering, (4) model development, and (5) model evaluation.

Discussion and conclusion: This framework is intended to provide practical guidance and serve as a basis for future elaboration and extension.

Keywords: computable algorithms; health outcomes; modeling methods; recommended practices.

PubMed Disclaimer

Conflict of interest statement

R.B. is an author on US Patent 9,075,796, “Text mining for large medical text datasets and corresponding medical text classification using informative feature selection.” At present, this patent is not licensed and does not generate royalties. All other authors have no competing interests to declare.

Figures

Figure 1.
Figure 1.
Flow diagram of 5 stages of computable phenotype algorithm development.
Figure 2.
Figure 2.
Relationship between clinical complexity, data complexity, and increasing phenotyping difficulty with illustrative phenotypes.
Figure 3.
Figure 3.
Selecting a final model from all models developed based on considerations of model performance, model transportability, and model generalizability.

References

    1. Floyd JS, Bann MA, Felcher AH, et al. Validation of acute pancreatitis among adults in an integrated healthcare system. Epidemiology. 2023;34(1):33-37. 10.1097/ede.0000000000001541 - DOI - PubMed
    1. Liu Y, Siddiqi KA, Cook RL, et al. Optimizing identification of people living with HIV from electronic medical records: computable phenotype development and validation. Methods Inf Med. 2021;60(3-4):84-94. 10.1055/s-0041-1735619. - DOI - PMC - PubMed
    1. Paul DW, Neely NB, Clement M, et al. Development and validation of an electronic medical record (EMR)-based computed phenotype of HIV-1 infection. J Am Med Inform Assoc. 2018;25(2):150-157. 10.1093/jamia/ocx061 - DOI - PMC - PubMed
    1. Goetz MB, Hoang T, Kan VL, Rimland D, Rodriguez-Barradas M. Development and validation of an algorithm to identify patients newly diagnosed with HIV infection from electronic health records. AIDS Res Hum Retroviruses. 2014;30(7):626-633. 10.1089/aid.2013.0287 - DOI - PubMed
    1. Walsh KE, Cutrona SL, Foy S, et al. Validation of anaphylaxis in the Food and Drug Administration's Mini-Sentinel. Pharmacoepidemiol Drug Saf. 2013;22(11):1205-1213. 10.1002/pds.3505 - DOI - PMC - PubMed