Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Dec 1;26(12):1437-1447.
doi: 10.1093/jamia/ocz179.

Improving the phenotype risk score as a scalable approach to identifying patients with Mendelian disease

Affiliations

Improving the phenotype risk score as a scalable approach to identifying patients with Mendelian disease

Lisa Bastarache et al. J Am Med Inform Assoc. .

Abstract

Objective: The Phenotype Risk Score (PheRS) is a method to detect Mendelian disease patterns using phenotypes from the electronic health record (EHR). We compared the performance of different approaches mapping EHR phenotypes to Mendelian disease features.

Materials and methods: PheRS utilizes Mendelian diseases descriptions annotated with Human Phenotype Ontology (HPO) terms. In previous work, we presented a map linking phecodes (based on International Classification of Diseases [ICD]-Ninth Revision) to HPO terms. For this study, we integrated ICD-Tenth Revision codes and lab data. We also created a new map between HPO terms using customized groupings of ICD codes. We compared the performance with cases and controls for 16 Mendelian diseases using 2.5 million de-identified medical records.

Results: PheRS effectively distinguished cases from controls for all 15 positive controls and all approaches tested (P < 4 × 1016). Adding lab data led to a statistically significant improvement for 4 of 14 diseases. The custom ICD groupings improved specificity, leading to an average 8% increase for precision at 100 (-2% to 22%). Eight of 10 adults with cystic fibrosis tested had PheRS in the 95th percentile prio to diagnosis.

Discussion: Both phecodes and custom ICD groupings were able to detect differences between affected cases and controls at the population level. The ICD map showed better precision for the highest scoring individuals. Adding lab data improved performance at detecting population-level differences.

Conclusions: PheRS is a scalable method to study Mendelian disease at the population level using electronic health record data and can potentially be used to find patients with undiagnosed Mendelian disease.

Keywords: Data mining; Diagnosis; Electronic health record; Mendelian genetics.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Diagram of Human Phenotype Ontology (HPO) maps tested in this article. Each map was used to translate phenotypic information contained in International Classification of Diseases (ICD) codes and labs into HPO terms. In the original HPO-phecode map, ICD codes were first translated the phecodes and then HPO terms. The new HPO-ICD map translates custom groupings of ICD codes to HPO without the intermediary phecodes. New information can be integrated into the Phenotype Risk Score, creating a map between the data elements and HPO terms, as we have done for labs. The HPO-phecode+labs table is not shown in this diagram. ICD-9: International Classification of Diseases-Ninth Revision; ICD-10: International Classification of Diseases-Tenth Revision.
Figure 2.
Figure 2.
Boxplots of Phenotype Risk Score cases for 2 diseases: These boxplots compare the residualized Phenotype Risk Score (rPheRS) of cases generated form the different maps vs controls (scored with Human Phenotype Ontology [HPO]-phecode map). For hereditary hemochromatosis (HH), the addition of labs improved performance. However, phecodes produces the highest percentage of outliers. For Marfan syndrome (MS), HPO-International Classification of Diseases (ICD) resulted in a higher median than HPO-phecode.
Figure 3.
Figure 3.
Precision @ K. Graphs of the precision for each disease tested at K = 10, 100, 1000 and 10000. The table includes the combined percentages for each map.
Figure 4.
Figure 4.
The receiver-operating characteristic curves for each disease testing the ability of Phenotype Risk Score (PheRS) to classify cases vs controls. The red line indicates the PheRS generated by the Human Phenotype Ontology [HPO]-phecode map; the blue line indicates the PheRS generated by the HPO-International Classification of Diseases (ICD) map; green is HPO-ICD+phecode; purple is HPO-ICD+lab. ACH: achondroplasia; A1A: alpha-1 antitrypsin deficiency; CF: cystic fibrosis; DGS: DiGeorge syndrome; DS: Down syndrome; DMD: Duchenne muscular dystrophy; FXS: fragile X syndrome; HH: hereditary hemochromatosis; HHT: hereditary hemorrhagic telangiectasia; MS: Marfan syndrome; NF1: neurofibromatosis, type 1; NF2: neurofibromatosis, type 2; PKU: phenylketonuria; PV: polycythemia vera; SCA: Sickle cell anemia; TS: tuberous sclerosis.
Figure 5.
Figure 5.
Phenotype Risk Score of adults with cystic fibrosis over time. Each row represents a patient. The dots indicate a clinic visit. The line is pink during the period before diagnosis and blue after diagnosis. Residualized Phenotype Risk Score (rPheRS) was calculated at each new clinical encounter using the Human Phenotype Ontology-International Classification of Diseases map.

References

    1. Ledley RS, Lusted LB.. Reasoning foundations of medical diagnosis. Science 1959; 1303366: 9–21. - PubMed
    1. McKusick VA. On lumpers and splitters, or the nosology of genetic disease. Perspect Biol Med 1969; 122: 298–312. - PubMed
    1. OMIM clinical synopsis—#219700—CYSTIC FIBROSIS; CF. https://www.omim.org/clinicalSynopsis/219700Accessed May 21, 2019.
    1. OMIM - Online Mendelian Inheritance in Man. http://omim.org/ Accessed May 20, 2014.
    1. Crawford DC, Crosslin DR, Tromp G, et al. eMERGEing progress in genomics—the first seven years. Front Genet 2014; 5; 184. - PMC - PubMed

Publication types