Machine learning in medicine: a practical introduction to techniques for data pre-processing, hyperparameter tuning, and model comparison
- PMID: 36319956
- PMCID: PMC9624048
- DOI: 10.1186/s12874-022-01758-8
Machine learning in medicine: a practical introduction to techniques for data pre-processing, hyperparameter tuning, and model comparison
Abstract
Background: There is growing enthusiasm for the application of machine learning (ML) and artificial intelligence (AI) techniques to clinical research and practice. However, instructions on how to develop robust high-quality ML and AI in medicine are scarce. In this paper, we provide a practical example of techniques that facilitate the development of high-quality ML systems including data pre-processing, hyperparameter tuning, and model comparison using open-source software and data.
Methods: We used open-source software and a publicly available dataset to train and validate multiple ML models to classify breast masses into benign or malignant using mammography image features and patient age. We compared algorithm predictions to the ground truth of histopathologic evaluation. We provide step-by-step instructions with accompanying code lines.
Findings: Performance of the five algorithms at classifying breast masses as benign or malignant based on mammography image features and patient age was statistically equivalent (P > 0.05). Area under the receiver operating characteristics curve (AUROC) for the logistic regression with elastic net penalty was 0.89 (95% CI 0.85 - 0.94), for the Extreme Gradient Boosting Tree 0.88 (95% CI 0.83 - 0.93), for the Multivariate Adaptive Regression Spline algorithm 0.88 (95% CI 0.83 - 0.93), for the Support Vector Machine 0.89 (95% CI 0.84 - 0.93), and for the neural network 0.89 (95% CI 0.84 - 0.93).
Interpretation: Our paper allows clinicians and medical researchers who are interested in using ML algorithms to understand and recreate the elements of a comprehensive ML analysis. Following our instructions may help to improve model generalizability and reproducibility in medical ML studies.
Keywords: Artificial intelligence; Guideline; Machine learning; Medicine.
© 2022. The Author(s).
Conflict of interest statement
The authors declare that there are no competing interests.
Figures






Similar articles
-
Machine learning in medicine: a practical introduction.BMC Med Res Methodol. 2019 Mar 19;19(1):64. doi: 10.1186/s12874-019-0681-4. BMC Med Res Methodol. 2019. PMID: 30890124 Free PMC article.
-
Machine Learning-Based Boosted Regression Ensemble Combined with Hyperparameter Tuning for Optimal Adaptive Learning.Sensors (Basel). 2022 May 16;22(10):3776. doi: 10.3390/s22103776. Sensors (Basel). 2022. PMID: 35632184 Free PMC article.
-
Comparison of Chest Radiograph Interpretations by Artificial Intelligence Algorithm vs Radiology Residents.JAMA Netw Open. 2020 Oct 1;3(10):e2022779. doi: 10.1001/jamanetworkopen.2020.22779. JAMA Netw Open. 2020. PMID: 33034642 Free PMC article.
-
Anatomy and Physiology of Artificial Intelligence in PET Imaging.PET Clin. 2021 Oct;16(4):471-482. doi: 10.1016/j.cpet.2021.06.003. Epub 2021 Aug 5. PET Clin. 2021. PMID: 34364817 Review.
-
Image annotation and curation in radiology: an overview for machine learning practitioners.Eur Radiol Exp. 2024 Feb 6;8(1):11. doi: 10.1186/s41747-023-00408-y. Eur Radiol Exp. 2024. PMID: 38316659 Free PMC article. Review.
Cited by
-
Predicting adolescent psychopathology from early life factors: A machine learning tutorial.Glob Epidemiol. 2024 Aug 29;8:100161. doi: 10.1016/j.gloepi.2024.100161. eCollection 2024 Dec. Glob Epidemiol. 2024. PMID: 39279846 Free PMC article.
-
Predictive modeling algorithms for liver metastasis in colorectal cancer: A systematic review of the current literature.Ann Hepatobiliary Pancreat Surg. 2024 Feb 29;28(1):14-24. doi: 10.14701/ahbps.23-078. Epub 2023 Dec 22. Ann Hepatobiliary Pancreat Surg. 2024. PMID: 38129965 Free PMC article. Review.
-
Machine learning insights into patient satisfaction following lateral lumbar interbody fusion.Eur Spine J. 2025 Feb 5. doi: 10.1007/s00586-025-08659-6. Online ahead of print. Eur Spine J. 2025. PMID: 39907777
-
On the importance of interpretable machine learning predictions to inform clinical decision making in oncology.Front Oncol. 2023 Feb 28;13:1129380. doi: 10.3389/fonc.2023.1129380. eCollection 2023. Front Oncol. 2023. PMID: 36925929 Free PMC article. Review.
-
A Machine Learning-Based Mortality Prediction Model for Patients with Chronic Hepatitis C Infection: An Exploratory Study.J Clin Med. 2024 May 16;13(10):2939. doi: 10.3390/jcm13102939. J Clin Med. 2024. PMID: 38792479 Free PMC article.
References
MeSH terms
LinkOut - more resources
Full Text Sources