Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 May 9:9:20552076231173225.
doi: 10.1177/20552076231173225. eCollection 2023 Jan-Dec.

Ensemble machine learning methods in screening electronic health records: A scoping review

Affiliations

Ensemble machine learning methods in screening electronic health records: A scoping review

Christophe At Stevens et al. Digit Health. .

Abstract

Background: Electronic health records provide the opportunity to identify undiagnosed individuals likely to have a given disease using machine learning techniques, and who could then benefit from more medical screening and case finding, reducing the number needed to screen with convenience and healthcare cost savings. Ensemble machine learning models combining multiple prediction estimates into one are often said to provide better predictive performances than non-ensemble models. Yet, to our knowledge, no literature review summarises the use and performances of different types of ensemble machine learning models in the context of medical pre-screening.

Method: We aimed to conduct a scoping review of the literature reporting the derivation of ensemble machine learning models for screening of electronic health records. We searched EMBASE and MEDLINE databases across all years applying a formal search strategy using terms related to medical screening, electronic health records and machine learning. Data were collected, analysed, and reported in accordance with the PRISMA scoping review guideline.

Results: A total of 3355 articles were retrieved, of which 145 articles met our inclusion criteria and were included in this study. Ensemble machine learning models were increasingly employed across several medical specialties and often outperformed non-ensemble approaches. Ensemble machine learning models with complex combination strategies and heterogeneous classifiers often outperformed other types of ensemble machine learning models but were also less used. Ensemble machine learning models methodologies, processing steps and data sources were often not clearly described.

Conclusions: Our work highlights the importance of deriving and comparing the performances of different types of ensemble machine learning models when screening electronic health records and underscores the need for more comprehensive reporting of machine learning methodologies employed in clinical research.

Keywords: Ensemble machine learning; electronic health records; mass screening; scoping review; supervised machine learning.

PubMed Disclaimer

Conflict of interest statement

The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Christophe AT Stevens (CATS) is an employee of Imperial College London and reports grants from Pfizer, Amgen, Merck Sharp & Dohme, Sanofi-Aventis, Daiichi Sankyo, and Regeneron, during the conduct of the study. Alexander RM Lyons (ARML) is an employee of Imperial College London and reports grants from Pfizer, Amgen, Merck Sharp & Dohme, Sanofi–Aventis, Daiichi Sankyo, and Regeneron, during the conduct of the study. Kanika I Dharmayat (KID) is an employee of Imperial College London and receives grants from Daiichi Sankyo, Amgen and Regeneron, and personal fees from Bayer and Regeneron; all outside of the submitted work. Alireza Mahani (AM) is an employee of Davidson Kempner Capital Management and has no conflict of interest to disclose. Kausik K Ray (KKR) is an employee of Imperial College London and reports grants and personal fees from Amgen, Sanofi–Regeneron, Pfizer, Merck Sharp & Dohme, and Daiichi Sankyo; and personal fees from AstraZeneca, The Medicines Company, Kowa, Novartis, Lilly, Algorithm, Boehringer Ingelheim, AbbVie, Silence Therapeutics, Bayer, Esperion, Abbott, New Amsterdam, and Resverlogix, outside the submitted work. Antonio J Vallejo-Vaz (AJV-V) is an employee of the University of Seville and acknowledges past or current participation in research grants to Imperial College London from Pfizer, Amgen, MSD, Sanofi-Aventis, Daiichi Sankyo and Regeneron, outside the submitted work; and received personal fees for consulting from Bayer and Regeneron and honoraria for lectures from Amgen, Mylan, Akcea and Ferrer, all outside the submitted work. Mansour TA Sharabiani (MTAS) is an employee of Imperial College London and has no conflict of interest to disclose.

Figures

Figure 1.
Figure 1.
Flowchart of articles inclusion and exclusion.
Figure 2.
Figure 2.
Use of ensemble and non-ensemble ML models in EHRs over time by medical specialty.

References

    1. Atasoy H, Greenwood BN, McCullough JS. The digitization of patient care: a review of the effects of electronic health records on health care quality and utilization. Annu Rev Public Health 2019; 40: 487–500. - PubMed
    1. Klecun E, Zhou Y, Kankanhalli A, et al.. National electronic health records implementation: a tale with a happy ending?, https://blogs.lse.ac.uk/businessreview/2020/01/23/national-electronic-he... (2020, accessed 10 January 2022).
    1. van der Laan MJ, Polley EC, Hubbard AE. Super learner. Stat Appl Genet Mol Biol 2007; 6. DOI: 10.2202/1544-6115.1309. - DOI - PubMed
    1. Džeroski S, Ženko B. Is combining classifiers with stacking better than selecting the best one? Mach Learn 2004; 54: 255–273.
    1. James C, Ranson JM, Everson R, et al.Performance of machine learning algorithms for predicting progression to dementia in memory clinic patients. JAMA Netw Open 2021; 4: e2136553–e2136553. - PMC - PubMed

Publication types

LinkOut - more resources