Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Mar 1;25(3):345-352.
doi: 10.1093/jamia/ocx137.

PIE: A prior knowledge guided integrated likelihood estimation method for bias reduction in association studies using electronic health records data

Affiliations

PIE: A prior knowledge guided integrated likelihood estimation method for bias reduction in association studies using electronic health records data

Jing Huang et al. J Am Med Inform Assoc. .

Abstract

Objectives: This study proposes a novel Prior knowledge guided Integrated likelihood Estimation (PIE) method to correct bias in estimations of associations due to misclassification of electronic health record (EHR)-derived binary phenotypes, and evaluates the performance of the proposed method by comparing it to 2 methods in common practice.

Methods: We conducted simulation studies and data analysis of real EHR-derived data on diabetes from Kaiser Permanente Washington to compare the estimation bias of associations using the proposed method, the method ignoring phenotyping errors, the maximum likelihood method with misspecified sensitivity and specificity, and the maximum likelihood method with correctly specified sensitivity and specificity (gold standard). The proposed method effectively leverages available information on phenotyping accuracy to construct a prior distribution for sensitivity and specificity, and incorporates this prior information through the integrated likelihood for bias reduction.

Results: Our simulation studies and real data application demonstrated that the proposed method effectively reduces the estimation bias compared to the 2 current methods. It performed almost as well as the gold standard method when the prior had highest density around true sensitivity and specificity. The analysis of EHR data from Kaiser Permanente Washington showed that the estimated associations from PIE were very close to the estimates from the gold standard method and reduced bias by 60%-100% compared to the 2 commonly used methods in current practice for EHR data.

Conclusions: This study demonstrates that the proposed method can effectively reduce estimation bias caused by imperfect phenotyping in EHR-derived data by incorporating prior information through integrated likelihood.

Keywords: association study; bias reduction; electronic health record; misclassification; prior information.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Comparison of likelihood function with unknown accuracy (blue solid line), likelihood function conditioned on misspecified accuracy (black solid line), likelihood function conditioned on known accuracy (black dashed line), and prior knowledge guided integrated likelihood function (red solid line). The true sensitivity and specificity are 90%.
Figure 2.
Figure 2.
Illustration of the 5 types of prior distributions in PIE method: PIE1 (distributions peak at the true values of sensitivity and specificity); PIE1_sv (distributions peak at the true values of sensitivity and specificity, with small variance); PIE2 (distributions have peaks that differ from true values); PIE2_lv (distributions have peaks that differ from true values, with large variance); and PIE3 (uniform distributions not centered at the true values). Vertical dashed line marks the true value of sensitivity or specificity, and solid line marks the peak of the prior distribution.
Figure 3.
Figure 3.
Box plots of estimates ofβ1 using the ML method with correctly specified sensitivity and specificity (gold standard), the method ignoring misclassification (naïve), the ML method with misspecified sensitivity and specificity (ML-MS), and the prior knowledge guided integrated likelihood method with 3 priors (PIE1, PIE2, PIE3). Solid black segment in each box shows the median of the estimates.
Figure 4.
Figure 4.
Box plots of estimates ofβ1 using the prior knowledge guided integrated likelihood method with 4 priors (PIE1_sv, PIE1, PIE2_lv, PIE2). Solid black segment in each box shows the median of the estimates.

References

    1. Denny JC,Crawford DC,Ritchie MD,et al.Variants near FOXE1 are associated with hypothyroidism and other thyroid conditions: using electronic medical records for genome- and phenome-wide studies.Am J Human Genet. 2011;894:529–42. - PMC - PubMed
    1. Denny JC,Ritchie MD,Crawford DC,et al.Identification of genomic predictors of atrioventricular conduction using electronic medical records as a tool for genome science.Circulation. 2010;12220:2016–21. - PMC - PubMed
    1. Kho AN,Pacheco JA,Peissig PL,et al.Electronic medical records for genetic research: results of the eMERGE consortium.Sci Trans Med. 2011;379:79re1. - PMC - PubMed
    1. Lemke AA,Wu JT,Waudby C,et al.Community engagement in biobanking: experiences from the eMERGE Network.Genomics, Soc Policy. 2010;63:1–18. - PMC - PubMed
    1. Ritchie MD,Denny JC,Crawford DC,et al.Robust replication of genotype-phenotype associations across multiple diseases in an electronic medical record.Am J Human Genet. 2010;864:560–72. - PMC - PubMed