Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Nov 30;14(12):1119-1120.
doi: 10.1038/nmeth.4526.

Points of Significance: Machine learning: a primer

Affiliations

Points of Significance: Machine learning: a primer

Danilo Bzdok et al. Nat Methods. .

Abstract

Machine learning extracts general principles from observed examples without explicit instructions.

PubMed Disclaimer

Conflict of interest statement

COMPETING FINANCIAL INTERESTS

The authors declare no competing financial interests.

Figures

Figure 1
Figure 1
Probing the basis of a psychiatric disorder at multiple levels. Schematic of how psychological, genetic, neurobiological and epidemiological observations can be used to automatically learn the difference between healthy individuals and affected patients. For each type of measurement (e.g., attention test scores), a learning algorithm is trained on part of the data and subsequently evaluated on remaining test data from independent individuals to obtain prediction performance estimates (50% accuracy corresponds to random guessing). The statistical uncertainty of the prediction accuracies is shown by 95% confidence intervals obtained from bootstrap resampling of data points with replacement.
Figure 2
Figure 2
General behaviors of machine-learning algorithms. (a) When algorithm complexity is low, both prediction on new data (“prediction error”) and failed model evaluation on the training data (“training error”) are high. In this high-bias regime, prediction is poor because the algorithm has a tendency to underfit structure in the data. As algorithm complexity increases, both errors drop but eventually prediction error rises again. The algorithm enters the high-variance regime, where it starts to overfit. (b) As training sample size increases, for a fixed level of algorithm complexity, prediction error drops and training error increases. This trend is more pronounced for low-complexity algorithms, such as logistic regression or linear regression, which have a limited capacity to improve with additional data. High-complexity algorithms, such as high-order polynomials, CART, or (deep) neural networks, on the other hand, continue to improve on the test data but their predictive performance is still limited by sources of noise. In this practical example, the low-complexity example could benefit from a more flexible algorithm and the high-complexity example from more data. The three dashed lines show a hypothetical desired error level.

References

    1. Jordan MI, Mitchell TM. Science. 2015;349:255–260. - PubMed
    1. Abu-Mostafa YS, Magdon-Ismail M, Lin HT. AMLBook; California: 2012.
    1. Kulesa A, Krzywinski M, Blainey P, Altman N. Points of significance: Sampling distributions and the bootstrap. Nature Methods. 2015;12(6) doi: 10.1038/nmeth.33. - DOI - PMC - PubMed
    1. Lever J, Krzywinski M, Altman N. Points of Significance: Model Selection and Overfitting. Nature Methods. 2016;13(9):703–704. doi: 10.1038/nmeth.3968. - DOI
    1. Hastie T, Tibshirani R, Friedman J. Springer Series in Statistics. Heidelberg: 2001.

LinkOut - more resources