Errors in multiple variables in human immunodeficiency virus (HIV) cohort and electronic health record data: statistical challenges and opportunities
- PMID: 35880997
- PMCID: PMC9204761
- DOI: 10.1515/scid-2019-0015
Errors in multiple variables in human immunodeficiency virus (HIV) cohort and electronic health record data: statistical challenges and opportunities
Abstract
Objectives: Observational data derived from patient electronic health records (EHR) data are increasingly used for human immunodeficiency virus/acquired immunodeficiency syndrome (HIV/AIDS) research. There are challenges to using these data, in particular with regards to data quality; some are recognized, some unrecognized, and some recognized but ignored. There are great opportunities for the statistical community to improve inference by incorporating validation subsampling into analyses of EHR data.Methods: Methods to address measurement error, misclassification, and missing data are relevant, as are sampling designs such as two-phase sampling. However, many of the existing statistical methods for measurement error, for example, only address relatively simple settings, whereas the errors seen in these datasets span multiple variables (both predictors and outcomes), are correlated, and even affect who is included in the study.Results/Conclusion: We will discuss some preliminary methods in this area with a particular focus on time-to-event outcomes and outline areas of future research.
Keywords: HIV; electronic health records; measurement error; misclassification; two-phase sampling.
© 2020 Walter de Gruyter GmbH, Berlin/Boston.
Conflict of interest statement
Competing interests: Authors state no conflict of interest.
Similar articles
-
Improved generalized raking estimators to address dependent covariate and failure-time outcome error.Biom J. 2021 Jun;63(5):1006-1027. doi: 10.1002/bimj.202000187. Epub 2021 Mar 11. Biom J. 2021. PMID: 33709462 Free PMC article.
-
Raking and regression calibration: Methods to address bias from correlated covariate and time-to-event error.Stat Med. 2021 Feb 10;40(3):631-649. doi: 10.1002/sim.8793. Epub 2020 Nov 2. Stat Med. 2021. PMID: 33140432 Free PMC article.
-
ACCOUNTING FOR DEPENDENT ERRORS IN PREDICTORS AND TIME-TO-EVENT OUTCOMES USING ELECTRONIC HEALTH RECORDS, VALIDATION SAMPLES, AND MULTIPLE IMPUTATION.Ann Appl Stat. 2020 Jun;14(2):1045-1061. doi: 10.1214/20-aoas1343. Epub 2020 Jun 29. Ann Appl Stat. 2020. PMID: 32999698 Free PMC article.
-
Challenges in and Opportunities for Electronic Health Record-Based Data Analysis and Interpretation.Gut Liver. 2024 Mar 15;18(2):201-208. doi: 10.5009/gnl230272. Epub 2023 Oct 31. Gut Liver. 2024. PMID: 37905424 Free PMC article. Review.
-
Adult patient access to electronic health records.Cochrane Database Syst Rev. 2021 Feb 26;2(2):CD012707. doi: 10.1002/14651858.CD012707.pub2. Cochrane Database Syst Rev. 2021. PMID: 33634854 Free PMC article.
Cited by
-
The Lifecycle of Electronic Health Record Data in HIV-Related Big Data Studies: Qualitative Study of Bias Instances and Potential Opportunities for Minimization.J Med Internet Res. 2025 Aug 7;27:e71388. doi: 10.2196/71388. J Med Internet Res. 2025. PMID: 40773672 Free PMC article.
-
Optimal sampling for design-based estimators of regression models.Stat Med. 2022 Apr 15;41(8):1482-1497. doi: 10.1002/sim.9300. Epub 2022 Jan 6. Stat Med. 2022. PMID: 34989429 Free PMC article.
-
Adopting Data to Care to Identify and Address Gaps in Services for Children and Adolescents Living With HIV in Mozambique.Glob Health Sci Pract. 2024 Apr 29;12(2):e2300130. doi: 10.9745/GHSP-D-23-00130. Print 2024 Apr 29. Glob Health Sci Pract. 2024. PMID: 38443100 Free PMC article.
References
-
- Balasubramanian R., Lagakos S. Estimation of a Failure Time Distribution Based on Imperfect Diagnostic Tests. Biometrika . 2003;90:171–82. doi: 10.1093/biomet/90.1.171. - DOI
Grants and funding
LinkOut - more resources
Full Text Sources