Machine learning approaches to lung cancer prediction from mass spectra
- PMID: 12973731
- DOI: 10.1002/pmic.200300523
Machine learning approaches to lung cancer prediction from mass spectra
Abstract
We addressed the problem of discriminating between 24 diseased and 17 healthy specimens on the basis of protein mass spectra. To prepare the data, we performed mass to charge ratio (m/z) normalization, baseline elimination, and conversion of absolute peak height measures to height ratios. After preprocessing, the major difficulty encountered was the extremely large number of variables (1676 m/z values) versus the number of examples (41). Dimensionality reduction was treated as an integral part of the classification process; variable selection was coupled with model construction in a single ten-fold cross-validation loop. We explored different experimental setups involving two peak height representations, two variable selection methods, and six induction algorithms, all on both the original 1676-mass data set and on a prescreened 124-mass data set. Highest predictive accuracies (1-2 off-sample misclassifications) were achieved by a multilayer perceptron and Naïve Bayes, with the latter displaying more consistent performance (hence greater reliability) over varying experimental conditions. We attempted to identify the most discriminant peaks (proteins) on the basis of scores assigned by the two variable selection methods and by neural network based sensitivity analysis. These three scoring schemes consistently ranked four peaks as the most relevant discriminators: 11683, 1403, 17350 and 66107.
Similar articles
-
Feature selection and nearest centroid classification for protein mass spectrometry.BMC Bioinformatics. 2005 Mar 23;6:68. doi: 10.1186/1471-2105-6-68. BMC Bioinformatics. 2005. PMID: 15788095 Free PMC article.
-
Decision tree classification of proteins identified by mass spectrometry of blood serum samples from people with and without lung cancer.Proteomics. 2003 Sep;3(9):1678-9. doi: 10.1002/pmic.200300521. Proteomics. 2003. PMID: 12973724
-
Discriminant models for high-throughput proteomics mass spectrometer data.Proteomics. 2003 Sep;3(9):1699-703. doi: 10.1002/pmic.200300518. Proteomics. 2003. PMID: 12973728
-
Feature selection and machine learning with mass spectrometry data.Methods Mol Biol. 2010;593:205-29. doi: 10.1007/978-1-60327-194-3_11. Methods Mol Biol. 2010. PMID: 19957152 Review.
-
Machine learning methods for predictive proteomics.Brief Bioinform. 2008 Mar;9(2):119-28. doi: 10.1093/bib/bbn008. Epub 2008 Feb 29. Brief Bioinform. 2008. PMID: 18310105 Review.
Cited by
-
Prediction model of ocular metastasis from primary liver cancer: Machine learning-based development and interpretation study.Cancer Med. 2023 Oct;12(20):20482-20496. doi: 10.1002/cam4.6540. Epub 2023 Oct 5. Cancer Med. 2023. PMID: 37795569 Free PMC article.
-
A study of aortic dissection screening method based on multiple machine learning models.J Thorac Dis. 2020 Mar;12(3):605-614. doi: 10.21037/jtd.2019.12.119. J Thorac Dis. 2020. PMID: 32274126 Free PMC article.
-
Characterising phase variations in MALDI-TOF data and correcting them by peak alignment.Cancer Inform. 2005;1(1):32-40. Cancer Inform. 2005. PMID: 19305630 Free PMC article.
-
Intelligence Algorithms for Protein Classification by Mass Spectrometry.Biomed Res Int. 2018 Nov 11;2018:2862458. doi: 10.1155/2018/2862458. eCollection 2018. Biomed Res Int. 2018. PMID: 30534555 Free PMC article. Review.
-
The parameter sensitivity of random forests.BMC Bioinformatics. 2016 Sep 1;17(1):331. doi: 10.1186/s12859-016-1228-x. BMC Bioinformatics. 2016. PMID: 27586051 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical