Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Observational Study
. 2017 Sep:188:224-231.e5.
doi: 10.1016/j.jpeds.2017.05.037. Epub 2017 Jun 16.

A Computable Phenotype Improves Cohort Ascertainment in a Pediatric Pulmonary Hypertension Registry

Collaborators, Affiliations
Observational Study

A Computable Phenotype Improves Cohort Ascertainment in a Pediatric Pulmonary Hypertension Registry

Alon Geva et al. J Pediatr. 2017 Sep.

Abstract

Objectives: To compare registry and electronic health record (EHR) data mining approaches for cohort ascertainment in patients with pediatric pulmonary hypertension (PH) in an effort to overcome some of the limitations of registry enrollment alone in identifying patients with particular disease phenotypes.

Study design: This study was a single-center retrospective analysis of EHR and registry data at Boston Children's Hospital. The local Informatics for Integrating Biology and the Bedside (i2b2) data warehouse was queried for billing codes, prescriptions, and narrative data related to pediatric PH. Computable phenotype algorithms were developed by fitting penalized logistic regression models to a physician-annotated training set. Algorithms were applied to a candidate patient cohort, and performance was evaluated using a separate set of 136 records and 179 registry patients. We compared clinical and demographic characteristics of patients identified by computable phenotype and the registry.

Results: The computable phenotype had an area under the receiver operating characteristics curve of 90% (95% CI, 85%-95%), a positive predictive value of 85% (95% CI, 77%-93%), and identified 413 patients (an additional 231%) with pediatric PH who were not enrolled in the registry. Patients identified by the computable phenotype were clinically distinct from registry patients, with a greater prevalence of diagnoses related to perinatal distress and left heart disease.

Conclusions: Mining of EHRs using computable phenotypes identified a large cohort of patients not recruited using a classic registry. Fusion of EHR and registry data can improve cohort ascertainment for the study of rare diseases.

Trial registration: ClinicalTrials.gov: NCT02249923.

Keywords: bioinformatics; computer-based model; pediatrics; pulmonary hypertension; registry.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

Figure 1
Figure 1. Overview of our EHR-based approach for cohort ascertainment
Figure 2
Figure 2. Venn diagram showing computable phenotype- and registry-based cohorts
The cohorts used for subsequent comparisons are labeled in small caps.
Figure 3
Figure 3. Comparison of ICD-9 coding practices between the computable phenotype and registry cohorts
Bars represent percent of patients who had each ICD-9 code recorded at least once. P < 0.0001 (****), P < 0.001 (***), P < 0.01 (**) for pairwise comparisons between cohorts.
Figure 4
Figure 4. ICD-9 codes with greater relative frequency among patients identified only in the EHR as compared with patients enrolled in the registry
Magenta lines indicate ICD-9 codes related to heart disease and green lines indicate ICD-9 codes related to neonatal distress. Trivial diagnoses are shown in light grey to emphasize diagnoses of interest. Only conditions with P < 1.5 are shown.
Figure 5
Figure 5. ICD-9 codes with greater relative frequency among registry patients as compared with patients identified only in the EHR
Magenta lines indicate ICD-9 codes related to gastrointestinal disease and malformations, blue lines indicate ICD-9 codes related to chronic respiratory disease, and green lines indicate ICD-9 codes related to prematurity, developmental disorders, and seizures. Trivial diagnoses are shown in light grey to emphasize diagnoses of interest. Only conditions with P < 0.05 are shown. Some ICD-9 code descriptions appear twice due to unique ICD-9 codes mapping to similar descriptions.

References

    1. McGoon MD, Miller DP. Reveal: A contemporary US pulmonary arterial hypertension registry. Eur Respir Rev. 2012;21:8–18. - PMC - PubMed
    1. Berger RM, Beghetti M, Humpl T, Raskob GE, Ivy DD, Jing ZC, et al. Clinical features of paediatric pulmonary hypertension: A registry study. Lancet. 2012;379:537–46. - PMC - PubMed
    1. Murphy S, Churchill S, Bry L, Chueh H, Weiss S, Lazarus R, et al. Instrumenting the health care enterprise for discovery research in the genomic era. Genome Res. 2009;19:1675–81. - PMC - PubMed
    1. Kohane IS. Using electronic health records to drive discovery in disease genomics. Nat Rev Genet. 2011;12:417–28. - PubMed
    1. Wei WQ, Denny JC. Extracting research-quality phenotypes from electronic health records to support precision medicine. Genome Med. 2015;7:41. - PMC - PubMed

Publication types

Associated data