A data-analytic strategy for protein biomarker discovery: profiling of high-dimensional proteomic data for cancer detection
- PMID: 12925511
- DOI: 10.1093/biostatistics/4.3.449
A data-analytic strategy for protein biomarker discovery: profiling of high-dimensional proteomic data for cancer detection
Abstract
With recent advances in mass spectrometry techniques, it is now possible to investigate proteins over a wide range of molecular weights in small biological specimens. This advance has generated data-analytic challenges in proteomics, similar to those created by microarray technologies in genetics, namely, discovery of 'signature' protein profiles specific to each pathologic state (e.g. normal vs. cancer) or differential profiles between experimental conditions (e.g. treated by a drug of interest vs. untreated) from high-dimensional data. We propose a data-analytic strategy for discovering protein biomarkers based on such high-dimensional mass spectrometry data. A real biomarker-discovery project on prostate cancer is taken as a concrete example throughout the paper: the project aims to identify proteins in serum that distinguish cancer, benign hyperplasia, and normal states of prostate using the Surface Enhanced Laser Desorption/Ionization (SELDI) technology, a recently developed mass spectrometry technique. Our data-analytic strategy takes properties of the SELDI mass spectrometer into account: the SELDI output of a specimen contains about 48,000 (x, y) points where x is the protein mass divided by the number of charges introduced by ionization and y is the protein intensity of the corresponding mass per charge value, x, in that specimen. Given high coefficients of variation and other characteristics of protein intensity measures (y values), we reduce the measures of protein intensities to a set of binary variables that indicate peaks in the y-axis direction in the nearest neighborhoods of each mass per charge point in the x-axis direction. We then account for a shifting (measurement error) problem of the x-axis in SELDI output. After this pre-analysis processing of data, we combine the binary predictors to generate classification rules for cancer, benign hyperplasia, and normal states of prostate. Our approach is to apply the boosting algorithm to select binary predictors and construct a summary classifier. We empirically evaluate sensitivity and specificity of the resulting summary classifiers with a test dataset that is independent from the training dataset used to construct the summary classifiers. The proposed method performed nearly perfectly in distinguishing cancer and benign hyperplasia from normal. In the classification of cancer vs. benign hyperplasia, however, an appreciable proportion of the benign specimens were classified incorrectly as cancer. We discuss practical issues associated with our proposed approach to the analysis of SELDI output and its application in cancer biomarker discovery.
Similar articles
-
Application of serum SELDI proteomic patterns in diagnosis of lung cancer.BMC Cancer. 2005 Jul 20;5:83. doi: 10.1186/1471-2407-5-83. BMC Cancer. 2005. PMID: 16029516 Free PMC article.
-
Identification of lung cancer patients by serum protein profiling using surface-enhanced laser desorption/ionization time-of-flight mass spectrometry.Am J Clin Oncol. 2008 Apr;31(2):133-9. doi: 10.1097/COC.0b013e318145b98b. Am J Clin Oncol. 2008. PMID: 18391596
-
[Proteomic analysis of prostate cancer using surface enhanced laser desorption/ionization mass spectrometry].Zhonghua Yi Xue Za Zhi. 2005 Nov 30;85(45):3172-5. Zhonghua Yi Xue Za Zhi. 2005. PMID: 16405834 Chinese.
-
Proteomics for the identification of new prostate cancer biomarkers.Urol Oncol. 2006 May-Jun;24(3):231-6. doi: 10.1016/j.urolonc.2005.11.035. Urol Oncol. 2006. PMID: 16678055 Review.
-
Protein profiling for cancer biomarker discovery using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry and infrared imaging: a review.Anal Chim Acta. 2011 Mar 25;690(1):26-34. doi: 10.1016/j.aca.2011.01.044. Epub 2011 Mar 2. Anal Chim Acta. 2011. PMID: 21414433 Review.
Cited by
-
Quality control and quality assessment of data from surface-enhanced laser desorption/ionization (SELDI) time-of flight (TOF) mass spectrometry (MS).BMC Bioinformatics. 2005 Jul 15;6 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2105-6-S2-S5. BMC Bioinformatics. 2005. PMID: 16026602 Free PMC article.
-
Reproducible cancer biomarker discovery in SELDI-TOF MS using different pre-processing algorithms.PLoS One. 2011;6(10):e26294. doi: 10.1371/journal.pone.0026294. Epub 2011 Oct 14. PLoS One. 2011. PMID: 22022591 Free PMC article.
-
Integrated multi-level quality control for proteomic profiling studies using mass spectrometry.BMC Bioinformatics. 2008 Dec 4;9:519. doi: 10.1186/1471-2105-9-519. BMC Bioinformatics. 2008. PMID: 19055809 Free PMC article.
-
On Comprehensive Mass Spectrometry Data Analysis for Proteome Profiling of Human Blood Samples.J Healthc Inform Res. 2018 May 22;2(3):305-318. doi: 10.1007/s41666-018-0022-0. eCollection 2018 Sep. J Healthc Inform Res. 2018. PMID: 35415410 Free PMC article.
-
Proteomic analysis in cancer research: potential application in clinical use.Clin Transl Oncol. 2006 Apr;8(4):250-61. doi: 10.1007/BF02664935. Clin Transl Oncol. 2006. PMID: 16648100 Review.
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical