Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov 1;22(1):282.
doi: 10.1186/s12874-022-01758-8.

Machine learning in medicine: a practical introduction to techniques for data pre-processing, hyperparameter tuning, and model comparison

Affiliations

Machine learning in medicine: a practical introduction to techniques for data pre-processing, hyperparameter tuning, and model comparison

André Pfob et al. BMC Med Res Methodol. .

Abstract

Background: There is growing enthusiasm for the application of machine learning (ML) and artificial intelligence (AI) techniques to clinical research and practice. However, instructions on how to develop robust high-quality ML and AI in medicine are scarce. In this paper, we provide a practical example of techniques that facilitate the development of high-quality ML systems including data pre-processing, hyperparameter tuning, and model comparison using open-source software and data.

Methods: We used open-source software and a publicly available dataset to train and validate multiple ML models to classify breast masses into benign or malignant using mammography image features and patient age. We compared algorithm predictions to the ground truth of histopathologic evaluation. We provide step-by-step instructions with accompanying code lines.

Findings: Performance of the five algorithms at classifying breast masses as benign or malignant based on mammography image features and patient age was statistically equivalent (P > 0.05). Area under the receiver operating characteristics curve (AUROC) for the logistic regression with elastic net penalty was 0.89 (95% CI 0.85 - 0.94), for the Extreme Gradient Boosting Tree 0.88 (95% CI 0.83 - 0.93), for the Multivariate Adaptive Regression Spline algorithm 0.88 (95% CI 0.83 - 0.93), for the Support Vector Machine 0.89 (95% CI 0.84 - 0.93), and for the neural network 0.89 (95% CI 0.84 - 0.93).

Interpretation: Our paper allows clinicians and medical researchers who are interested in using ML algorithms to understand and recreate the elements of a comprehensive ML analysis. Following our instructions may help to improve model generalizability and reproducibility in medical ML studies.

Keywords: Artificial intelligence; Guideline; Machine learning; Medicine.

PubMed Disclaimer

Conflict of interest statement

The authors declare that there are no competing interests.

Figures

Fig. 1
Fig. 1
Data pre-processing within the resampling process
Fig. 2
Fig. 2
Excerpt training dataset after pre-processing steps
Fig. 3
Fig. 3
Final model and internal testing results for the Logistic Regression with Elastic Net Penalty
Fig. 4
Fig. 4
Receiver Operating Characteristic curves in the Validation Set
Fig. 5
Fig. 5
Calibration Plot of the Support Vector Machine
Fig. 6
Fig. 6
Performance Comparison – Differences in Area under the Curve

Similar articles

Cited by

References

    1. Yu KH, Beam AL, Kohane IS. Artificial intelligence in healthcare. Nat Biomed Eng. 2018;2:719–731. doi: 10.1038/s41551-018-0305-z. - DOI - PubMed
    1. Scott IA. Machine learning and evidence-based medicine. Ann Intern Med. 2018;169:44–46. doi: 10.7326/M18-0115. - DOI - PubMed
    1. Rajkomar A, Dean J, Kohane I. Machine learning in medicine. N Engl J Med. 2019;380:1347–1358. doi: 10.1056/NEJMra1814259. - DOI - PubMed
    1. Hosny A, Parmar C, Quackenbush J, Schwartz LH, Aerts HJWL. Artificial intelligence in radiology. Nat Rev Cancer. 2018;18:500–510. doi: 10.1038/s41568-018-0016-5. - DOI - PMC - PubMed
    1. Pfob A, Mehrara BJ, Nelson JA, Wilkins EG, Pusic AL, Sidey-Gibbons C. Towards Patient-Centered Decision-Making in Breast Cancer Surgery. Ann Surg 2021; published online March 18. 10.1097/SLA.0000000000004862. - PMC - PubMed

LinkOut - more resources