A data-analytic strategy for protein biomarker discovery: profiling of high-dimensional proteomic data for cancer detection

Yutaka Yasui¹, Margaret Pepe, Mary Lou Thompson, Bao-Ling Adam, George L Wright Jr, Yinsheng Qu, John D Potter, Marcy Winget, Mark Thornquist, Ziding Feng

Affiliations

PMID: 12925511
DOI: 10.1093/biostatistics/4.3.449

A data-analytic strategy for protein biomarker discovery: profiling of high-dimensional proteomic data for cancer detection

Yutaka Yasui et al. Biostatistics. 2003 Jul.

. 2003 Jul;4(3):449-63.

doi: 10.1093/biostatistics/4.3.449.

Authors

Yutaka Yasui¹, Margaret Pepe, Mary Lou Thompson, Bao-Ling Adam, George L Wright Jr, Yinsheng Qu, John D Potter, Marcy Winget, Mark Thornquist, Ziding Feng

Affiliation

¹ Cancer Prevention Research Program, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave. N., Seattle, WA 98109-1024, USA. yyasui@fhcrc.org

PMID: 12925511
DOI: 10.1093/biostatistics/4.3.449

Abstract

With recent advances in mass spectrometry techniques, it is now possible to investigate proteins over a wide range of molecular weights in small biological specimens. This advance has generated data-analytic challenges in proteomics, similar to those created by microarray technologies in genetics, namely, discovery of 'signature' protein profiles specific to each pathologic state (e.g. normal vs. cancer) or differential profiles between experimental conditions (e.g. treated by a drug of interest vs. untreated) from high-dimensional data. We propose a data-analytic strategy for discovering protein biomarkers based on such high-dimensional mass spectrometry data. A real biomarker-discovery project on prostate cancer is taken as a concrete example throughout the paper: the project aims to identify proteins in serum that distinguish cancer, benign hyperplasia, and normal states of prostate using the Surface Enhanced Laser Desorption/Ionization (SELDI) technology, a recently developed mass spectrometry technique. Our data-analytic strategy takes properties of the SELDI mass spectrometer into account: the SELDI output of a specimen contains about 48,000 (x, y) points where x is the protein mass divided by the number of charges introduced by ionization and y is the protein intensity of the corresponding mass per charge value, x, in that specimen. Given high coefficients of variation and other characteristics of protein intensity measures (y values), we reduce the measures of protein intensities to a set of binary variables that indicate peaks in the y-axis direction in the nearest neighborhoods of each mass per charge point in the x-axis direction. We then account for a shifting (measurement error) problem of the x-axis in SELDI output. After this pre-analysis processing of data, we combine the binary predictors to generate classification rules for cancer, benign hyperplasia, and normal states of prostate. Our approach is to apply the boosting algorithm to select binary predictors and construct a summary classifier. We empirically evaluate sensitivity and specificity of the resulting summary classifiers with a test dataset that is independent from the training dataset used to construct the summary classifiers. The proposed method performed nearly perfectly in distinguishing cancer and benign hyperplasia from normal. In the classification of cancer vs. benign hyperplasia, however, an appreciable proportion of the benign specimens were classified incorrectly as cancer. We discuss practical issues associated with our proposed approach to the analysis of SELDI output and its application in cancer biomarker discovery.

PubMed Disclaimer

Cited by

Quality control and quality assessment of data from surface-enhanced laser desorption/ionization (SELDI) time-of flight (TOF) mass spectrometry (MS).
Hong H, Dragan Y, Epstein J, Teitel C, Chen B, Xie Q, Fang H, Shi L, Perkins R, Tong W. Hong H, et al. BMC Bioinformatics. 2005 Jul 15;6 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2105-6-S2-S5. BMC Bioinformatics. 2005. PMID: 16026602 Free PMC article.
Reproducible cancer biomarker discovery in SELDI-TOF MS using different pre-processing algorithms.
Zou J, Hong G, Guo X, Zhang L, Yao C, Wang J, Guo Z. Zou J, et al. PLoS One. 2011;6(10):e26294. doi: 10.1371/journal.pone.0026294. Epub 2011 Oct 14. PLoS One. 2011. PMID: 22022591 Free PMC article.
Integrated multi-level quality control for proteomic profiling studies using mass spectrometry.
Cairns DA, Perkins DN, Stanley AJ, Thompson D, Barrett JH, Selby PJ, Banks RE. Cairns DA, et al. BMC Bioinformatics. 2008 Dec 4;9:519. doi: 10.1186/1471-2105-9-519. BMC Bioinformatics. 2008. PMID: 19055809 Free PMC article.
On Comprehensive Mass Spectrometry Data Analysis for Proteome Profiling of Human Blood Samples.
Manchanda S, Meyer M, Li Q, Liang K, Li Y, Kong N. Manchanda S, et al. J Healthc Inform Res. 2018 May 22;2(3):305-318. doi: 10.1007/s41666-018-0022-0. eCollection 2018 Sep. J Healthc Inform Res. 2018. PMID: 35415410 Free PMC article.
Proteomic analysis in cancer research: potential application in clinical use.
García-Foncillas J, Bandrés E, Zárate R, Remírez N. García-Foncillas J, et al. Clin Transl Oncol. 2006 Apr;8(4):250-61. doi: 10.1007/BF02664935. Clin Transl Oncol. 2006. PMID: 16648100 Review.

See all "Cited by" articles

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

U01-CA86368/CA/NCI NIH HHS/United States

LinkOut - more resources

Full Text Sources
- Ovid Technologies, Inc.
- Silverchair Information Systems
Other Literature Sources
- The Lens - Patent Citations Database
Medical
- ClinicalTrials.gov
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A data-analytic strategy for protein biomarker discovery: profiling of high-dimensional proteomic data for cancer detection

Affiliation

A data-analytic strategy for protein biomarker discovery: profiling of high-dimensional proteomic data for cancer detection

Authors

Affiliation

Abstract

Similar articles

Cited by

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical