Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jul 7;12(7):e0175508.
doi: 10.1371/journal.pone.0175508. eCollection 2017.

Evaluating phecodes, clinical classification software, and ICD-9-CM codes for phenome-wide association studies in the electronic health record

Affiliations

Evaluating phecodes, clinical classification software, and ICD-9-CM codes for phenome-wide association studies in the electronic health record

Wei-Qi Wei et al. PLoS One. .

Abstract

Objective: To compare three groupings of Electronic Health Record (EHR) billing codes for their ability to represent clinically meaningful phenotypes and to replicate known genetic associations. The three tested coding systems were the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) codes, the Agency for Healthcare Research and Quality Clinical Classification Software for ICD-9-CM (CCS), and manually curated "phecodes" designed to facilitate phenome-wide association studies (PheWAS) in EHRs.

Methods and materials: We selected 100 disease phenotypes and compared the ability of each coding system to accurately represent them without performing additional groupings. The 100 phenotypes included 25 randomly-chosen clinical phenotypes pursued in prior genome-wide association studies (GWAS) and another 75 common disease phenotypes mentioned across free-text problem lists from 189,289 individuals. We then evaluated the performance of each coding system to replicate known associations for 440 SNP-phenotype pairs.

Results: Out of the 100 tested clinical phenotypes, phecodes exactly matched 83, compared to 53 for ICD-9-CM and 32 for CCS. ICD-9-CM codes were typically too detailed (requiring custom groupings) while CCS codes were often not granular enough. Among 440 tested known SNP-phenotype associations, use of phecodes replicated 153 SNP-phenotype pairs compared to 143 for ICD-9-CM and 139 for CCS. Phecodes also generally produced stronger odds ratios and lower p-values for known associations than ICD-9-CM and CCS. Finally, evaluation of several SNPs via PheWAS identified novel potential signals, some seen in only using the phecode approach. Among them, rs7318369 in PEPD was associated with gastrointestinal hemorrhage.

Conclusion: Our results suggest that the phecode groupings better align with clinical diseases mentioned in clinical practice or for genomic studies. ICD-9-CM, CCS, and phecode groupings all worked for PheWAS-type studies, though the phecode groupings produced superior results.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Weighted Venn diagrams of the distributions of power-enabled tests, replicated associations, best ORs, and best P values with CCS, ICD-9-CM, and phecodes.
Each color represents a resource.
Fig 2
Fig 2. PheWAS results of three SNPs (rs35391, rs731839 and rs769449) showed that phecodes outperformed ICD-9-CM and CCS.
Fig 3
Fig 3. PEPD expression results suggest strong association with the gastrointestinal tract.
Fig 4
Fig 4. SNP rs731839 is a cis acting eQTL for PEPD in esophagus mucosa.

References

    1. Roden DM, Xu H, Denny JC, Wilke RA. Electronic medical records as a tool in clinical pharmacology: opportunities and challenges. Clinical pharmacology and therapeutics. 2012;91(6):1083–86. Epub 2012/04/27. doi: 10.1038/clpt.2012.42 ; PubMed Central PMCID: PMC3819803. - DOI - PMC - PubMed
    1. Wilke RA, Xu H, Denny JC, Roden DM, Krauss RM, McCarty CA, et al. The emerging role of electronic medical records in pharmacogenomics. Clinical pharmacology and therapeutics. 2011;89(3):379–86. Epub 2011/01/21. doi: 10.1038/clpt.2010.260 ; PubMed Central PMCID: PMC3204342. - DOI - PMC - PubMed
    1. Bowton E, Field JR, Wang S, Schildcrout JS, Van Driest SL, Delaney JT, et al. Biobanks and electronic medical records: enabling cost-effective research. Science translational medicine. 2014;6(234):234cm3 Epub 2014/05/03. doi: 10.1126/scitranslmed.3008604 . - DOI - PMC - PubMed
    1. Kohane IS. Using electronic health records to drive discovery in disease genomics. Nature reviews Genetics. 2011;12(6):417–28. Epub 2011/05/19. doi: 10.1038/nrg2999 . - DOI - PubMed
    1. Hersh WR, Weiner MG, Embi PJ, Logan JR, Payne PR, Bernstam EV, et al. Caveats for the use of operational electronic health record data in comparative effectiveness research. Medical care. 2013;51(8 Suppl 3):S30–7. Epub 2013/06/19. doi: 10.1097/MLR.0b013e31829b1dbd ; PubMed Central PMCID: PMC3748381. - DOI - PMC - PubMed

LinkOut - more resources