Errors in multiple variables in human immunodeficiency virus (HIV) cohort and electronic health record data: statistical challenges and opportunities
- PMID: 35880997
- PMCID: PMC9204761
- DOI: 10.1515/scid-2019-0015
Errors in multiple variables in human immunodeficiency virus (HIV) cohort and electronic health record data: statistical challenges and opportunities
Abstract
Objectives: Observational data derived from patient electronic health records (EHR) data are increasingly used for human immunodeficiency virus/acquired immunodeficiency syndrome (HIV/AIDS) research. There are challenges to using these data, in particular with regards to data quality; some are recognized, some unrecognized, and some recognized but ignored. There are great opportunities for the statistical community to improve inference by incorporating validation subsampling into analyses of EHR data.Methods: Methods to address measurement error, misclassification, and missing data are relevant, as are sampling designs such as two-phase sampling. However, many of the existing statistical methods for measurement error, for example, only address relatively simple settings, whereas the errors seen in these datasets span multiple variables (both predictors and outcomes), are correlated, and even affect who is included in the study.Results/Conclusion: We will discuss some preliminary methods in this area with a particular focus on time-to-event outcomes and outline areas of future research.
Keywords: HIV; electronic health records; measurement error; misclassification; two-phase sampling.
© 2020 Walter de Gruyter GmbH, Berlin/Boston.
Conflict of interest statement
Competing interests: Authors state no conflict of interest.
References
-
- Balasubramanian R., Lagakos S. Estimation of a Failure Time Distribution Based on Imperfect Diagnostic Tests. Biometrika . 2003;90:171–82. doi: 10.1093/biomet/90.1.171. - DOI
Grants and funding
LinkOut - more resources
Full Text Sources