Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Aug 28;5(10):101046.
doi: 10.1016/j.patter.2024.101046. eCollection 2024 Oct 11.

Avoiding common machine learning pitfalls

Affiliations
Review

Avoiding common machine learning pitfalls

Michael A Lones. Patterns (N Y). .

Abstract

Mistakes in machine learning practice are commonplace and can result in loss of confidence in the findings and products of machine learning. This tutorial outlines common mistakes that occur when using machine learning and what can be done to avoid them. While it should be accessible to anyone with a basic understanding of machine learning techniques, it focuses on issues that are of particular concern within academic research, such as the need to make rigorous comparisons and reach valid conclusions. It covers five stages of the machine learning process: what to do before model building, how to reliably build models, how to robustly evaluate models, how to compare models fairly, and how to report results.

Keywords: guidance; machine learning; practice.

PubMed Disclaimer

Conflict of interest statement

The author declares no competing interests.

Figures

Figure 1
Figure 1
See “do not allow test data to leak into the training process” (Left) How things should be, with the training set used to train the model and the test set used to measure its generality. (Right) When there is a data leak, the test set can implicitly become part of the training process, meaning that it no longer provides a reliable measure of generality.
Figure 2
Figure 2
See “do keep up with progress in deep learning (and its pitfalls)” A rough history of neural networks and deep learning showing what I consider to be the milestones in their development. For a far more thorough account of the field’s historical development, take a look at Schmidhuber.,
Figure 3
Figure 3
See “do be careful where and how you do feature selection” (Top) Data leakage due to carrying out feature selection before splitting off the test data (outlined in red), causing the test set to become an implicit part of model training. (Middle) How it should be done. (Bottom) When using cross-validation, it is important to carry out feature selection independently for each iteration, based only on the subset of data (shown in blue) used for training during that iteration.
Figure 4
Figure 4
See “do avoid learning spurious correlations” The problem of spurious correlations in images as illustrated by the tank problem. The images on the left are tanks, and those on the right are not tanks. However, the consistent background (blue for tanks, gray for others) means that these images can be classified by merely looking at the colors of pixels toward the top of the images rather than having to recognize the objects in the images, resulting in a poor model.
Figure 5
Figure 5
See “do avoid sequential overfitting” (Top) Using the test set repeatedly during model selection results in the test set becoming an implicit part of the training process. (Bottom) A validation set should be used instead during model selection, and the test set should only be used once to measure the generality of the final model.
Figure 6
Figure 6
See “do choose metrics carefully” The problem with using accuracy as a performance metric on imbalanced data. Here, a dummy model that always predicts the same class label has an accuracy of 50% or 90% depending on the distribution of class labels within the data.
Figure 7
Figure 7
See “do not ignore temporal dependencies in time-series data” (Top) A time series is scaled to the interval [0,1] before splitting off the test data (shown in red). This could allow the model to infer that values will increase in the future, causing a potential look ahead bias. (Bottom) Instead, the data should be split before doing scaling so that information about the range of the test data cannot leak into the training data.
Figure 8
Figure 8
See “do look at your models” Using saliency maps to analyze vision-based deep learning models. Imagine these two maps (in red) were generated for the image shown in the center, for two different deep learning models trained on the kind of tank recognition data mentioned in “do avoid learning spurious correlations.” Darker colors indicate features that are of greater importance to the model, so the model on the left (which predominantly focuses on the components of the tank) is likely to generalize much better than the one on the right (which predominantly focuses on the background of the image).

References

    1. Liao T., Taori R., Raji I.D., Schmidt L. Thirty-Fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2) 2021. Are we learning yet? A meta review of evaluation failures across machine learning.https://openreview.net/forum?id=mPducS1MsEK
    1. Gibney E. Is AI fuelling a reproducibility crisis in science? Nature. 2022;608:250–251. doi: 10.1038/d41586-022-02035-w. - DOI - PubMed
    1. Stevens L.M., Mortazavi B.J., Deo R.C., Curtis L., Kao D.P. Recommendations for reporting machine learning analyses in clinical research. Circ. Cardiovasc. Qual. Outcomes. 2020;13 doi: 10.1161/CIRCOUTCOMES.120.006556. - DOI - PMC - PubMed
    1. Whalen S., Schreiber J., Noble W.S., Pollard K.S. Navigating the pitfalls of applying machine learning in genomics. Nat. Rev. Genet. 2022;23:169–181. doi: 10.1038/s41576-021-00434-9. - DOI - PubMed
    1. Zhu J.-J., Yang M., Ren Z.J. Machine learning in environmental research: common pitfalls and best practices. Environ. Sci. Technol. 2023;57:17671–17689. doi: 10.1021/acs.est.3c00026. - DOI - PubMed

LinkOut - more resources