Lessons and tips for designing a machine learning study using EHR data

Jaron Arbet^#¹, Cole Brokamp^#^{2

3}, Jareen Meinzen-Derr^#^{2

3}, Katy E Trinkley^#^{4

5}, Heidi M Spratt⁶

Affiliations

¹ Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado-Denver Anschutz Medical Campus, Aurora, CO, USA.
² Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA.
³ Division of Biostatistics and Epidemiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA.
⁴ Department of Clinical Pharmacy, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado, Aurora, CO, USA.
⁵ Department of Medicine, School of Medicine, University of Colorado, Aurora, CO, USA.
⁶ Department of Preventive Medicine and Population Health, University of Texas Medical Branch, Galveston, TX, USA.

^# Contributed equally.

PMID: 33948244
PMCID: PMC8057454
DOI: 10.1017/cts.2020.513

Review

Lessons and tips for designing a machine learning study using EHR data

Jaron Arbet et al. J Clin Transl Sci. 2020.

. 2020 Jul 24;5(1):e21.

doi: 10.1017/cts.2020.513.

Authors

Jaron Arbet^#¹, Cole Brokamp^#^{2

3}, Jareen Meinzen-Derr^#^{2

3}, Katy E Trinkley^#^{4

5}, Heidi M Spratt⁶

Affiliations

¹ Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado-Denver Anschutz Medical Campus, Aurora, CO, USA.
² Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA.
³ Division of Biostatistics and Epidemiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA.
⁴ Department of Clinical Pharmacy, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado, Aurora, CO, USA.
⁵ Department of Medicine, School of Medicine, University of Colorado, Aurora, CO, USA.
⁶ Department of Preventive Medicine and Population Health, University of Texas Medical Branch, Galveston, TX, USA.

^# Contributed equally.

PMID: 33948244
PMCID: PMC8057454
DOI: 10.1017/cts.2020.513

Abstract

Machine learning (ML) provides the ability to examine massive datasets and uncover patterns within data without relying on a priori assumptions such as specific variable associations, linearity in relationships, or prespecified statistical interactions. However, the application of ML to healthcare data has been met with mixed results, especially when using administrative datasets such as the electronic health record. The black box nature of many ML algorithms contributes to an erroneous assumption that these algorithms can overcome major data issues inherent in large administrative healthcare data. As with other research endeavors, good data and analytic design is crucial to ML-based studies. In this paper, we will provide an overview of common misconceptions for ML, the corresponding truths, and suggestions for incorporating these methods into healthcare research while maintaining a sound study design.

Keywords: Machine learning; electronic health record; healthcare research; research methodology; translational research.

PubMed Disclaimer

Conflict of interest statement

No authors reported conflicts of interest.

Figures

**Fig. 1.**
Illustration of the iterative machine learning process.

See this image and copyright information in PMC

References

1. Blumenthal D, Tavenner M. The “meaningful use” regulation for electronic health records. The New England Journal of Medicine 2010; 363: 1–3. - PubMed
1. Liu Y, et al. How to read articles that use machine learning: users’ guides to the medical literature. JAMA 2019; 322(18): 1806–1816. - PubMed
1. Johnson AE, et al. MIMIC-III, a freely accessible critical care database. Scientific Data 2016; 3: 160035. - PMC - PubMed
1. Richesson R, Smerek M. Electronic health records-based phenotyping. Rethinking Clinical Trials: A Living Textbook of Pragmatic Clinical Trials, 2016; 2014.
1. Kang H. The prevention and handling of the missing data. Korean Journal of Anesthesiology 2013; 64(5): 402–406. - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Lessons and tips for designing a machine learning study using EHR data

Affiliations

Lessons and tips for designing a machine learning study using EHR data

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials

Miscellaneous