Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2014 Jul 5:13:94.
doi: 10.1186/1475-925X-13-94.

Machine learning, medical diagnosis, and biomedical engineering research - commentary

Affiliations
Review

Machine learning, medical diagnosis, and biomedical engineering research - commentary

Kenneth R Foster et al. Biomed Eng Online. .

Abstract

A large number of papers are appearing in the biomedical engineering literature that describe the use of machine learning techniques to develop classifiers for detection or diagnosis of disease. However, the usefulness of this approach in developing clinically validated diagnostic techniques so far has been limited and the methods are prone to overfitting and other problems which may not be immediately apparent to the investigators. This commentary is intended to help sensitize investigators as well as readers and reviewers of papers to some potential pitfalls in the development of classifiers, and suggests steps that researchers can take to help avoid these problems. Building classifiers should be viewed not simply as an add-on statistical analysis, but as part and parcel of the experimental process. Validation of classifiers for diagnostic applications should be considered as part of a much larger process of establishing the clinical validity of the diagnostic technique.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Apparent accuracy of classifiers ( ACC ) applied to synthetic training sets of equal numbers of “healthy” and “ill” subjects, with 10 attributes for each subject created using a random number generator. The horizontal axis is the ratio of the number of “ill” subjects to number of attributes (10 in each case). The increase in accuracy (ACC in the vertical axis) for smaller training sets is a result of use of a too-small training set, coupled with post-hoc theorizing. Since the set had an equal number of “patients” and “healthy” individuals, the accuracy of the classifier should be 50% as expected by chance.
Figure 2
Figure 2
Sensitivity ( SEN ) and specificity ( SPC ) of classifier applied to a validation set of 50 healthy and 50 ill subjects. The training set consisted of the indicated number of individuals with Hashimoto’s disease (horizontal axis) with an equal number of healthy subjects. The test set consisted of different individuals than those used for the training set. Ten attributes were defined for each image.

References

    1. Broadhurst DI, Kell DB. Statistical strategies for avoiding false discoveries in metabolomics and related experiments. Metabolomics. 2006;2(4):171–196.
    1. Duda R, Hart P, Stork D. Pattern Classification. 2. New York, NY: John Wiley & Sons, Inc.; 2001.
    1. Breiman L, Friedman J, Olshen R, Stone C. Classification and Regression Trees. Boca Raton, FL: CRC Press; 1984.
    1. Tadeusiewicz R, Ogiela MR. Automatic understanding of medical images new achievements in syntactic analysis of selected medical images. Biocybern Biomed Eng. 2002;22(4):17–29.
    1. Kunchewa LI. Combining Pattern Classifiers, Methods and Algorithms. Hoboken, New Jersey: John Wiley & Sons, Inc.; 2004.

MeSH terms