Ensemble machine learning methods in screening electronic health records: A scoping review
- PMID: 37188075
- PMCID: PMC10176785
- DOI: 10.1177/20552076231173225
Ensemble machine learning methods in screening electronic health records: A scoping review
Abstract
Background: Electronic health records provide the opportunity to identify undiagnosed individuals likely to have a given disease using machine learning techniques, and who could then benefit from more medical screening and case finding, reducing the number needed to screen with convenience and healthcare cost savings. Ensemble machine learning models combining multiple prediction estimates into one are often said to provide better predictive performances than non-ensemble models. Yet, to our knowledge, no literature review summarises the use and performances of different types of ensemble machine learning models in the context of medical pre-screening.
Method: We aimed to conduct a scoping review of the literature reporting the derivation of ensemble machine learning models for screening of electronic health records. We searched EMBASE and MEDLINE databases across all years applying a formal search strategy using terms related to medical screening, electronic health records and machine learning. Data were collected, analysed, and reported in accordance with the PRISMA scoping review guideline.
Results: A total of 3355 articles were retrieved, of which 145 articles met our inclusion criteria and were included in this study. Ensemble machine learning models were increasingly employed across several medical specialties and often outperformed non-ensemble approaches. Ensemble machine learning models with complex combination strategies and heterogeneous classifiers often outperformed other types of ensemble machine learning models but were also less used. Ensemble machine learning models methodologies, processing steps and data sources were often not clearly described.
Conclusions: Our work highlights the importance of deriving and comparing the performances of different types of ensemble machine learning models when screening electronic health records and underscores the need for more comprehensive reporting of machine learning methodologies employed in clinical research.
Keywords: Ensemble machine learning; electronic health records; mass screening; scoping review; supervised machine learning.
© The Author(s) 2023.
Conflict of interest statement
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Christophe AT Stevens (CATS) is an employee of Imperial College London and reports grants from Pfizer, Amgen, Merck Sharp & Dohme, Sanofi-Aventis, Daiichi Sankyo, and Regeneron, during the conduct of the study. Alexander RM Lyons (ARML) is an employee of Imperial College London and reports grants from Pfizer, Amgen, Merck Sharp & Dohme, Sanofi–Aventis, Daiichi Sankyo, and Regeneron, during the conduct of the study. Kanika I Dharmayat (KID) is an employee of Imperial College London and receives grants from Daiichi Sankyo, Amgen and Regeneron, and personal fees from Bayer and Regeneron; all outside of the submitted work. Alireza Mahani (AM) is an employee of Davidson Kempner Capital Management and has no conflict of interest to disclose. Kausik K Ray (KKR) is an employee of Imperial College London and reports grants and personal fees from Amgen, Sanofi–Regeneron, Pfizer, Merck Sharp & Dohme, and Daiichi Sankyo; and personal fees from AstraZeneca, The Medicines Company, Kowa, Novartis, Lilly, Algorithm, Boehringer Ingelheim, AbbVie, Silence Therapeutics, Bayer, Esperion, Abbott, New Amsterdam, and Resverlogix, outside the submitted work. Antonio J Vallejo-Vaz (AJV-V) is an employee of the University of Seville and acknowledges past or current participation in research grants to Imperial College London from Pfizer, Amgen, MSD, Sanofi-Aventis, Daiichi Sankyo and Regeneron, outside the submitted work; and received personal fees for consulting from Bayer and Regeneron and honoraria for lectures from Amgen, Mylan, Akcea and Ferrer, all outside the submitted work. Mansour TA Sharabiani (MTAS) is an employee of Imperial College London and has no conflict of interest to disclose.
Figures
References
-
- Atasoy H, Greenwood BN, McCullough JS. The digitization of patient care: a review of the effects of electronic health records on health care quality and utilization. Annu Rev Public Health 2019; 40: 487–500. - PubMed
-
- Klecun E, Zhou Y, Kankanhalli A, et al.. National electronic health records implementation: a tale with a happy ending?, https://blogs.lse.ac.uk/businessreview/2020/01/23/national-electronic-he... (2020, accessed 10 January 2022).
-
- Džeroski S, Ženko B. Is combining classifiers with stacking better than selecting the best one? Mach Learn 2004; 54: 255–273.
Publication types
LinkOut - more resources
Full Text Sources
Miscellaneous