Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Jul 20:4:1-19.
doi: 10.1146/annurev-biodatasci-122320-112352.

Using Phecodes for Research with the Electronic Health Record: From PheWAS to PheRS

Affiliations
Review

Using Phecodes for Research with the Electronic Health Record: From PheWAS to PheRS

Lisa Bastarache. Annu Rev Biomed Data Sci. .

Abstract

Electronic health records (EHRs) are a rich source of data for researchers, but extracting meaningful information out of this highly complex data source is challenging. Phecodes represent one strategy for defining phenotypes for research using EHR data. They are a high-throughput phenotyping tool based on ICD (International Classification of Diseases) codes that can be used to rapidly define the case/control status of thousands of clinically meaningful diseases and conditions. Phecodes were originally developed to conduct phenome-wide association studies to scan for phenotypic associations with common genetic variants. Since then, phecodes have been used to support a wide range of EHR-based phenotyping methods, including the phenotype risk score. This review aims to comprehensively describe the development, validation, and applications of phecodes and suggest some future directions for phecodes and high-throughput phenotyping.

Keywords: Mendelian genetics; electronic health record; genomics; phecodes; phenome-wide association study (PheWAS); phenotype risk score; phenotype risk score (PheRS); phenotyping.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The anatomy of a phecode. A phecode is a three-digit parent code with optional digits following a decimal point. Numbers after the decimal point indicate a hierarchical relationship. Phecodes without subordinate codes are called leaf codes. Each phecode has a string label and is linked to an exclude range. Cases are often defined as individuals with two or more unique phecodes, and controls are defined as individuals who do not have any code within the exclude range.
Figure 2
Figure 2
Phecode performance based on the minimum code count required for a patient to count as a case. Requiring two or more phecodes on unique dates to define a case resulted in the highest mean F1 score across four phenotypes.
Figure 3
Figure 3
Phecode statistics by chapter. Across phecode chapters, there is variability in the total number of phecodes (purple bars; left axis) as well as the average number of ICD-9 (International Classification of Diseases, Ninth Revision) codes that define each leaf phecode (black points; right axis).
Figure 4
Figure 4
Linking phecodes to single-nucleotide polymorphisms (SNPs). The GWAS (genome-wide association study) Catalog reports SNP–trait associations found in previous studies. Catalog traits are annotated with the Experimental Factory Ontology (EFO). Phecodes are linked to SNPs through a phecode/EFO map; three examples are shown here.
Figure 5
Figure 5
Mapping from The Human Phenotype Ontology (HPO) to phecodes. Select features from Online Mendelian Inheritance in Man (OMIM) clinical description of cystic fibrosis are shown as HPO terms (left), along with their mapping to phecodes (right).
Figure 6
Figure 6
Phenotype risk score (PheRS) for cystic fibrosis (CF) of a patient diagnosed late in life. The PheRS for CF score rises over time as the patient acquires more diagnoses that overlap with the disease profile. By the time this patient was diagnosed with CF, their PheRS was higher than that of 99% of patients.

References

    1. Leader JB, Pendergrass SA, Verma A, Carey DJ, Hartzel DN, et al. 2015. Contrasting association results between existing PheWAS phenotype definition methods and five validated electronic phenotypes. AMIA Annu. Symp. Proc 2015:824–32 - PMC - PubMed
    1. WHO (World Health Organ.). 2020. International Classification of Diseases (ICD) information sheet. Fact Sheet, World Health Organ. https://www.who.int/standards/classifications/classification-of-diseases
    1. Beck DE, Margolin DA. 2007. Physician coding and reimbursement. Ochsner. J 7(1):8–15 - PMC - PubMed
    1. WHO (World Health Organ.). 2020. History of the development of the ICD. Fact Sheet, World Health Organ. https://www.who.int/classifications/icd/en/HistoryOfICD.pdf
    1. Hirsch JA, Nicola G, McGinty G, Liu RW, Barr RM, et al. 2016. ICD-10: history and context. Am. J. Neuroradiol 37(4):596–99 - PMC - PubMed

Publication types

LinkOut - more resources