Impact of Diverse Data Sources on Computational Phenotyping
- PMID: 32582289
- PMCID: PMC7283539
- DOI: 10.3389/fgene.2020.00556
Impact of Diverse Data Sources on Computational Phenotyping
Abstract
Electronic health records (EHRs) are widely adopted with a great potential to serve as a rich, integrated source of phenotype information. Computational phenotyping, which extracts phenotypes from EHR data automatically, can accelerate the adoption and utilization of phenotype-driven efforts to advance scientific discovery and improve healthcare delivery. A list of computational phenotyping algorithms has been published but data fragmentation, i.e., incomplete data within one single data source, has been raised as an inherent limitation of computational phenotyping. In this study, we investigated the impact of diverse data sources on two published computational phenotyping algorithms, rheumatoid arthritis (RA) and type 2 diabetes mellitus (T2DM), using Mayo EHRs and Rochester Epidemiology Project (REP) which links medical records from multiple health care systems. Results showed that both RA (less prevalent) and T2DM (more prevalent) case selections were markedly impacted by data fragmentation, with positive predictive value (PPV) of 91.4 and 92.4%, false-negative rate (FNR) of 26.6 and 14% in Mayo data, respectively, PPV of 97.2 and 98.3%, FNR of 5.2 and 3.3% in REP. T2DM controls also contain biases, with PPV of 91.2% and FNR of 1.2% for Mayo. We further elaborated underlying reasons impacting the performance.
Keywords: computational phenotyping; diverse data sources; phenotyping algorithms; rheumatoid arthritis; type 2 diabetes mellitus.
Copyright © 2020 Wang, Olson, Bielinski, St. Sauver, Fu, He, Cicek, Hathcock, Cerhan and Liu.
Figures






Similar articles
-
Impact of data fragmentation across healthcare centers on the accuracy of a high-throughput clinical phenotyping algorithm for specifying subjects with type 2 diabetes mellitus.J Am Med Inform Assoc. 2012 Mar-Apr;19(2):219-24. doi: 10.1136/amiajnl-2011-000597. Epub 2012 Jan 16. J Am Med Inform Assoc. 2012. PMID: 22249968 Free PMC article.
-
High-throughput phenotyping with temporal sequences.J Am Med Inform Assoc. 2021 Mar 18;28(4):772-781. doi: 10.1093/jamia/ocaa288. J Am Med Inform Assoc. 2021. PMID: 33313899 Free PMC article.
-
Combining billing codes, clinical notes, and medications from electronic health records provides superior phenotyping performance.J Am Med Inform Assoc. 2016 Apr;23(e1):e20-7. doi: 10.1093/jamia/ocv130. Epub 2015 Sep 2. J Am Med Inform Assoc. 2016. PMID: 26338219 Free PMC article.
-
Development of Type 2 Diabetes Mellitus Phenotyping Framework Using Expert Knowledge and Machine Learning Approach.J Diabetes Sci Technol. 2017 Jul;11(4):791-799. doi: 10.1177/1932296816681584. Epub 2016 Dec 7. J Diabetes Sci Technol. 2017. PMID: 27932531 Free PMC article.
-
Natural Language Processing for EHR-Based Computational Phenotyping.IEEE/ACM Trans Comput Biol Bioinform. 2019 Jan-Feb;16(1):139-153. doi: 10.1109/TCBB.2018.2849968. Epub 2018 Jun 25. IEEE/ACM Trans Comput Biol Bioinform. 2019. PMID: 29994486 Free PMC article. Review.
Cited by
-
Automated Type 2 Diabetes Case and Control Identification from the MIMIC-IV Database.AMIA Jt Summits Transl Sci Proc. 2023 Jun 16;2023:602-611. eCollection 2023. AMIA Jt Summits Transl Sci Proc. 2023. PMID: 37350886 Free PMC article.
-
Evolution of clinical Health Information Exchanges to population health resources: a case study of the Indiana network for patient care.BMC Med Inform Decis Mak. 2025 Feb 24;25(1):97. doi: 10.1186/s12911-025-02933-9. BMC Med Inform Decis Mak. 2025. PMID: 39994604 Free PMC article.
-
Optimal Surrogate-Assisted Sampling for Cost-Efficient Validation of Electronic Health Record Outcomes.Stat Med. 2025 May;44(10-12):e70095. doi: 10.1002/sim.70095. Stat Med. 2025. PMID: 40404279 Free PMC article.
-
Recommended practices and ethical considerations for natural language processing-assisted observational research: A scoping review.Clin Transl Sci. 2023 Mar;16(3):398-411. doi: 10.1111/cts.13463. Epub 2022 Dec 26. Clin Transl Sci. 2023. PMID: 36478394 Free PMC article.
-
Establishing an expert consensus for the operational definitions of asthma-associated infectious and inflammatory multimorbidities for computational algorithms through a modified Delphi technique.BMC Med Inform Decis Mak. 2021 Nov 8;21(1):310. doi: 10.1186/s12911-021-01663-y. BMC Med Inform Decis Mak. 2021. PMID: 34749701 Free PMC article.
References
-
- Aletaha D., Neogi T., Silman A. J., Funovits J., Felson D. T., Bingham C. O., et al. (2010). 2010 Rheumatoid arthritis classification criteria: an American College of Rheumatology/European League Against Rheumatism collaborative initiative. Arthr. Rheum 62 2569–2581. 10.1002/art.27584 - DOI - PubMed
Grants and funding
LinkOut - more resources
Full Text Sources