Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Feb 11:2:59-77.

Applications of machine learning in cancer prediction and prognosis

Affiliations

Applications of machine learning in cancer prediction and prognosis

Joseph A Cruz et al. Cancer Inform. .

Abstract

Machine learning is a branch of artificial intelligence that employs a variety of statistical, probabilistic and optimization techniques that allows computers to "learn" from past examples and to detect hard-to-discern patterns from large, noisy or complex data sets. This capability is particularly well-suited to medical applications, especially those that depend on complex proteomic and genomic measurements. As a result, machine learning is frequently used in cancer diagnosis and detection. More recently machine learning has been applied to cancer prognosis and prediction. This latter approach is particularly interesting as it is part of a growing trend towards personalized, predictive medicine. In assembling this review we conducted a broad survey of the different types of machine learning methods being used, the types of data being integrated and the performance of these methods in cancer prediction and prognosis. A number of trends are noted, including a growing dependence on protein biomarkers and microarray data, a strong bias towards applications in prostate and breast cancer, and a heavy reliance on "older" technologies such artificial neural networks (ANNs) instead of more recently developed or more easily interpretable machine learning methods. A number of published studies also appear to lack an appropriate level of validation or testing. Among the better designed and validated studies it is clear that machine learning methods can be used to substantially (15-25%) improve the accuracy of predicting cancer susceptibility, recurrence and mortality. At a more fundamental level, it is also evident that machine learning is also helping to improve our basic understanding of cancer development and progression.

Keywords: Cancer; machine learning; prediction; prognosis; risk.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
A histogram showing the steady increase in published papers using machine learning methods to predict cancer risk, recurrence and outcome. The data were collected using a variety of keyword searches through PubMed, CiteSeer, Google Scholar, Science Citation Index and other online resources. Each bar represents the cumulative total of papers published over a two year period. The earliest papers appeared in the early 1990’s.
Figure 2.
Figure 2.
An example of how a machine learner is trained to recognize images using a training set (a corrupted image of the number “8”) which is labeled or identified as the number “8”.
Figure 3.
Figure 3.
An example of a simple decision tree that might be used in breast cancer diagnosis and treatment. This is an example of a tree that might be formulated via expert assessment. Similar tree structures can be generated by decision tree learners.
Figure 4.
Figure 4.
A simplified illustration of how an SVM might work in distinguishing between basketball players and weightlifters using height/weight support vectors. In this simple case the SVM has identified a hyperplane (actually a line) which maximizes the separation between the two clusters.
Figure 5.
Figure 5.
A histogram showing the frequency with which different types of machine learning methods are used to predict different types of cancer. Breast and prostate cancer dominate, however a good range of cancers from different organs or tissues also appear to be compatible with machine learning prognoses. The “other” cancers include brain, cervical, esophageal, leukemia, head, neck, ocular, osteosarcoma, pleural mesothelioma, thoracic, thyroid, and trophoblastic (uterine) malignancies. Figure 1.

References

    1. Aha D. Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms. International Journal of Man-Machine Studies. 1992;36:267–287.
    1. Ando T, Suguro M, Hanai T, et al. Fuzzy neural network applied to gene expression profiling for predicting the prognosis of diffuse large B-cell lymphoma. Jpn J Cancer Res. 2002;93:1207–12. - PMC - PubMed
    1. Ando T, Suguro M, Kobayashi T, et al. Multiple fuzzy neural network system for outcome prediction and classification of 220 lymphoma patients on the basis of molecular profiling. Cancer Sci. 2003;94:906–13. - PMC - PubMed
    1. Atlas L, Cole R, Connor J, et al. Performance comparisons between backpropagation networks and classification trees on three real-world applications. Advances in Neural Inf. Process. Systems. 1990;2:622–629.
    1. Bach PB, Kattan MW, Thornquist MD, et al. Variations in lung cancer risk among smokers. J Natl Cancer Inst. 2003;95:470–8. - PubMed

LinkOut - more resources