Significance analysis of spectral count data in label-free shotgun proteomics

Hyungwon Choi¹, Damian Fermin, Alexey I Nesvizhskii

Affiliations

PMID: 18644780
PMCID: PMC2596341
DOI: 10.1074/mcp.M800203-MCP200

Significance analysis of spectral count data in label-free shotgun proteomics

Hyungwon Choi et al. Mol Cell Proteomics. 2008 Dec.

. 2008 Dec;7(12):2373-85.

doi: 10.1074/mcp.M800203-MCP200. Epub 2008 Jul 20.

Authors

Hyungwon Choi¹, Damian Fermin, Alexey I Nesvizhskii

Affiliation

¹ Department of Pathology, University of Michigan, Ann Arbor, Michigan 48109, USA.

PMID: 18644780
PMCID: PMC2596341
DOI: 10.1074/mcp.M800203-MCP200

Abstract

Spectral counting has become a commonly used approach for measuring protein abundance in label-free shotgun proteomics. At the same time, the development of data analysis methods has lagged behind. Currently most studies utilizing spectral counts rely on simple data transforms and posthoc corrections of conventional signal-to-noise ratio statistics. However, these adjustments can neither handle the bias toward high abundance proteins nor deal with the drawbacks due to the limited number of replicates. We present a novel statistical framework (QSpec) for the significance analysis of differential expression with extensions to a variety of experimental design factors and adjustments for protein properties. Using synthetic and real experimental data sets, we show that the proposed method outperforms conventional statistical methods that search for differential expression for individual proteins. We illustrate the flexibility of the model by analyzing a data set with a complicated experimental design involving cellular localization and time course.

PubMed Disclaimer

Figures

F<sc>ig</sc>. 1. — **Fig. 1.**
**Generalized linear mixed model with hierarchical Bayes for the analysis of spectral count data.** The expected counts are normalized by the sequence length of the protein i and the normalizing constant equivalent to the overall abundance of each MS/MS experiment j. In the main text, the sequence length and the normalizing constant are denoted by *L_i* and *N_j*, respectively. c₀ is the base-line abundance, and *b_0i* and *b_1i* are the protein-specific abundance and differential expression parameters for protein i. *Experiment Design Factors* may include any discrete levels by which the expected counts may vary, *e.g.* time points, subcellular localization, etc. A, a subset of the spectral count data matrix without design factors and replicates. B, a subset of the spectral count data matrix with time course and subcellular localization factors. wk, weeks.

F<sc>ig</sc>. 2. — **Fig. 2.**
**The number of true positive proteins (from the total of 200) identified by QSpec and PLGEM-StN at fixed FDRs in synthetic data sets with known -fold changes and using different number of replicates.** A, QSpec, 2-fold change. B, PLGEM-StN, 2-fold change. C, QSpec, 4-fold change. D, PLGEM-StN, 4-fold change. *rep*, replicate(s).

F<sc>ig</sc>. 3. — **Fig. 3.**
The proportion of true positive proteins (sensitivity of identification) identified by QSpec in the synthetic data sets with 2-fold change (A) and 4-fold change (B) across the range of protein abundance. (x–y) implies counts ranging from x to y. *Rep*, replicate.

F<sc>ig</sc>. 4. — **Fig. 4.**
**Venn diagram of the selected proteins from QSpec with all 1508 proteins and PLGEM-StN with the subset of 511 proteins (27).** *Tables A* and B correspond to the significantly enriched gene ontology terms in the protein list identified by QSpec and PLGEM-StN, respectively.

F<sc>ig</sc>. 5. — **Fig. 5.**
**Selected proteins and functional annotation in the mouse mutant model data set.** A, clustered time course graphs by time points and organelles. Time points (T1, T2, and T3) correspond to week 8, 16, and 24, respectively. B, heat map of differential expression in the nine categories by time point and organelle. *Yellow* indicates overexpression in the PLN R9C mutant relative to the wild type, and *blue* indicates underexpression. Gene ontology terms with FDR-adjusted p value less than 0.05 are reported. dw, down; w, weeks.

See this image and copyright information in PMC

References

1. Domon, B., and Aebersold, R. ( 2006) Mass spectrometry and protein analysis. Science 312, 212–217 - PubMed
1. Nesvizhskii, A. I., Vitek, O., and Aebersold, R. ( 2007) Analysis and validation of proteomic data generated by tandem mass spectrometry. Nat. Methods 4, 787–797 - PubMed
1. Gygi, S. P., Rist, B., Gerber, S. A., Turecek, F. Gelb, M. H., and Aebersold, R. ( 1999) Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nat. Biotechnol. 17, 994–999 - PubMed
1. Ong, S. E., Blagoev, B., Kratchmarova, I., Kristensen, D. B., Steen, H., Pandey, A., and Mann, M. ( 2002) Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Mol. Cell. Proteomics 1, 376–386 - PubMed
1. Ross, P. L., Huang, Y. N., Marchese, J. N., Williamson, B., Parker, K., Hattan, S., Khainovski, N., Pillai, S., Dey, S., Daniels, S., Purkayastha, S., Juhasz, P., Martin, S., Bartlet-Jones, M., He, F., Jacobson, A., and Pappin, D. J. ( 2004) Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents. Mol. Cell. Proteomics 3, 1154–1169 - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Significance analysis of spectral count data in label-free shotgun proteomics

Affiliation

Significance analysis of spectral count data in label-free shotgun proteomics

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases