Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2013 Dec;17(12):595-610.
doi: 10.1089/omi.2013.0017. Epub 2013 Oct 12.

Application of machine learning to proteomics data: classification and biomarker identification in postgenomics biology

Affiliations
Review

Application of machine learning to proteomics data: classification and biomarker identification in postgenomics biology

Anna Louise Swan et al. OMICS. 2013 Dec.

Abstract

Mass spectrometry is an analytical technique for the characterization of biological samples and is increasingly used in omics studies because of its targeted, nontargeted, and high throughput abilities. However, due to the large datasets generated, it requires informatics approaches such as machine learning techniques to analyze and interpret relevant data. Machine learning can be applied to MS-derived proteomics data in two ways. First, directly to mass spectral peaks and second, to proteins identified by sequence database searching, although relative protein quantification is required for the latter. Machine learning has been applied to mass spectrometry data from different biological disciplines, particularly for various cancers. The aims of such investigations have been to identify biomarkers and to aid in diagnosis, prognosis, and treatment of specific diseases. This review describes how machine learning has been applied to proteomics tandem mass spectrometry data. This includes how it can be used to identify proteins suitable for use as biomarkers of disease and for classification of samples into disease or treatment groups, which may be applicable for diagnostics. It also includes the challenges faced by such investigations, such as prediction of proteins present, protein quantification, planning for the use of machine learning, and small sample sizes.

PubMed Disclaimer

Figures

FIG. 1.
FIG. 1.
An overview of the topics covered in this review, including the general work flow required and the major considerations that are necessary before beginning an investigation combining mass spectrometry and machine learning.
FIG. 2.
FIG. 2.
Proteomics mass spectrometry data analysis workflow. The workflow diverges into two sections; the first involves peak picking and application of machine learning directly on mass spectral peaks. This is in comparison to the second section, which involves quantification of proteins, either labeled or label-free, followed by machine learning.
FIG. 3.
FIG. 3.
A simplified Decision Tree that divides data into its three classes, based on two attributes.
FIG. 4.
FIG. 4.
A graphical representation of Support Vector Machines and the linear division of two classes.
FIG. 5.
FIG. 5.
A representation of Artificial Neural Networks and the layers involved in the generation of a model.

References

    1. Abeel T. Helleputte T. Van De Peer Y. Dupont P. Saeys Y. Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics. 2010;26:392–398. - PubMed
    1. Adam BL. Qu Y. Davis JW, et al. Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men. Cancer Res. 2002;62:3609–3614. - PubMed
    1. Aebersold R. Mann M. Mass spectrometry-based proteomics. Nature. 2003;422:198–207. - PubMed
    1. Ambroise C. McLachlan GJ. Selection bias in gene extraction on the basis of microarray gene-expression data. Proc Natl Acad Sci. 2002;99:6562–6566. - PMC - PubMed
    1. Ashburner M. Ball C. Blake J, et al. Gene ontology: Tool for the unification of biology. Nature Genet. 2000;25:25–29. - PMC - PubMed

Publication types