Protocols for disease classification from mass spectrometry data
- PMID: 12973727
- DOI: 10.1002/pmic.200300519
Protocols for disease classification from mass spectrometry data
Abstract
We report our results in classifying protein matrix-assisted laser desorption/ionization-time of flight mass spectra obtained from serum samples into diseased and healthy groups. We discuss in detail five of the steps in preprocessing the mass spectral data for biomarker discovery, as well as our criterion for choosing a small set of peaks for classifying the samples. Cross-validation studies with four selected proteins yielded misclassification rates in the 10-15% range for all the classification methods. Three of these proteins or protein fragments are down-regulated and one up-regulated in lung cancer, the disease under consideration in this data set. When cross-validation studies are performed, care must be taken to ensure that the test set does not influence the choice of the peaks used in the classification. Misclassification rates are lower when both the training and test sets are used to select the peaks used in classification versus when only the training set is used. This expectation was validated for various statistical discrimination methods when thirteen peaks were used in cross-validation studies. One particular classification method, a linear support vector machine, exhibited especially robust performance when the number of peaks was varied from four to thirteen, and when the peaks were selected from the training set alone. Experiments with the samples randomly assigned to the two classes confirmed that misclassification rates were significantly higher in such cases than those observed with the true data. This indicates that our findings are indeed significant. We found closely matching masses in a database for protein expression in lung cancer for three of the four proteins we used to classify lung cancer. Data from additional samples, increased experience with the performance of various preprocessing techniques, and affirmation of the biological roles of the proteins that help in classification, will strengthen our conclusions in the future.
Similar articles
-
Bootstrap classification and point-based feature selection from age-staged mouse cerebellum tissues of matrix assisted laser desorption/ionization mass spectra using a fuzzy rule-building expert system.Anal Chim Acta. 2007 Sep 19;599(2):219-31. doi: 10.1016/j.aca.2007.08.007. Epub 2007 Aug 6. Anal Chim Acta. 2007. PMID: 17870284 Free PMC article.
-
Generalizable mass spectrometry mining used to identify disease state biomarkers from blood serum.Proteomics. 2003 Sep;3(9):1710-5. doi: 10.1002/pmic.200300516. Proteomics. 2003. PMID: 12973730
-
Identification of lung cancer patients by serum protein profiling using surface-enhanced laser desorption/ionization time-of-flight mass spectrometry.Am J Clin Oncol. 2008 Apr;31(2):133-9. doi: 10.1097/COC.0b013e318145b98b. Am J Clin Oncol. 2008. PMID: 18391596
-
Detection of lung cancer using plasma protein profiling by matrix-assisted laser desorption/ionization mass spectrometry.Eur J Mass Spectrom (Chichester). 2010;16(4):539-49. doi: 10.1255/ejms.1080. Eur J Mass Spectrom (Chichester). 2010. PMID: 20625202
-
Identification of 2 serum biomarkers of renal cell carcinoma by surface enhanced laser desorption/ionization mass spectrometry.J Urol. 2005 Oct;174(4 Pt 1):1213-7. doi: 10.1097/01.ju.0000173915.83164.87. J Urol. 2005. PMID: 16145372
Cited by
-
Correcting common errors in identifying cancer-specific serum peptide signatures.J Proteome Res. 2005 Jul-Aug;4(4):1060-72. doi: 10.1021/pr050034b. J Proteome Res. 2005. PMID: 16083255 Free PMC article.
-
Mass spectrometry and multivariate analysis to classify cervical intraepithelial neoplasia from blood plasma: an untargeted lipidomic study.Sci Rep. 2018 Mar 2;8(1):3954. doi: 10.1038/s41598-018-22317-6. Sci Rep. 2018. PMID: 29500376 Free PMC article.
-
Bootstrap classification and point-based feature selection from age-staged mouse cerebellum tissues of matrix assisted laser desorption/ionization mass spectra using a fuzzy rule-building expert system.Anal Chim Acta. 2007 Sep 19;599(2):219-31. doi: 10.1016/j.aca.2007.08.007. Epub 2007 Aug 6. Anal Chim Acta. 2007. PMID: 17870284 Free PMC article.
-
Label-free, normalized quantification of complex mass spectrometry data for proteomic analysis.Nat Biotechnol. 2010 Jan;28(1):83-9. doi: 10.1038/nbt.1592. Epub 2009 Dec 13. Nat Biotechnol. 2010. PMID: 20010810 Free PMC article.
-
Classification of astrocytomas and oligodendrogliomas from mass spectrometry data using sparse kernel machines.Annu Int Conf IEEE Eng Med Biol Soc. 2011;2011:7965-8. doi: 10.1109/IEMBS.2011.6091964. Annu Int Conf IEEE Eng Med Biol Soc. 2011. PMID: 22256188 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources