Lessons and tips for designing a machine learning study using EHR data
- PMID: 33948244
- PMCID: PMC8057454
- DOI: 10.1017/cts.2020.513
Lessons and tips for designing a machine learning study using EHR data
Abstract
Machine learning (ML) provides the ability to examine massive datasets and uncover patterns within data without relying on a priori assumptions such as specific variable associations, linearity in relationships, or prespecified statistical interactions. However, the application of ML to healthcare data has been met with mixed results, especially when using administrative datasets such as the electronic health record. The black box nature of many ML algorithms contributes to an erroneous assumption that these algorithms can overcome major data issues inherent in large administrative healthcare data. As with other research endeavors, good data and analytic design is crucial to ML-based studies. In this paper, we will provide an overview of common misconceptions for ML, the corresponding truths, and suggestions for incorporating these methods into healthcare research while maintaining a sound study design.
Keywords: Machine learning; electronic health record; healthcare research; research methodology; translational research.
© The Association for Clinical and Translational Science 2020.
Conflict of interest statement
No authors reported conflicts of interest.
Figures
References
-
- Blumenthal D, Tavenner M. The “meaningful use” regulation for electronic health records. The New England Journal of Medicine 2010; 363: 1–3. - PubMed
-
- Liu Y, et al. How to read articles that use machine learning: users’ guides to the medical literature. JAMA 2019; 322(18): 1806–1816. - PubMed
-
- Richesson R, Smerek M. Electronic health records-based phenotyping. Rethinking Clinical Trials: A Living Textbook of Pragmatic Clinical Trials, 2016; 2014.
Publication types
Grants and funding
LinkOut - more resources
Full Text Sources
Research Materials
Miscellaneous