Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Dec;7(12):2373-85.
doi: 10.1074/mcp.M800203-MCP200. Epub 2008 Jul 20.

Significance analysis of spectral count data in label-free shotgun proteomics

Affiliations

Significance analysis of spectral count data in label-free shotgun proteomics

Hyungwon Choi et al. Mol Cell Proteomics. 2008 Dec.

Abstract

Spectral counting has become a commonly used approach for measuring protein abundance in label-free shotgun proteomics. At the same time, the development of data analysis methods has lagged behind. Currently most studies utilizing spectral counts rely on simple data transforms and posthoc corrections of conventional signal-to-noise ratio statistics. However, these adjustments can neither handle the bias toward high abundance proteins nor deal with the drawbacks due to the limited number of replicates. We present a novel statistical framework (QSpec) for the significance analysis of differential expression with extensions to a variety of experimental design factors and adjustments for protein properties. Using synthetic and real experimental data sets, we show that the proposed method outperforms conventional statistical methods that search for differential expression for individual proteins. We illustrate the flexibility of the model by analyzing a data set with a complicated experimental design involving cellular localization and time course.

PubMed Disclaimer

Figures

F<sc>ig</sc>. 1.
Fig. 1.
Generalized linear mixed model with hierarchical Bayes for the analysis of spectral count data. The expected counts are normalized by the sequence length of the protein i and the normalizing constant equivalent to the overall abundance of each MS/MS experiment j. In the main text, the sequence length and the normalizing constant are denoted by Li and Nj, respectively. c0 is the base-line abundance, and b0i and b1i are the protein-specific abundance and differential expression parameters for protein i. Experiment Design Factors may include any discrete levels by which the expected counts may vary, e.g. time points, subcellular localization, etc. A, a subset of the spectral count data matrix without design factors and replicates. B, a subset of the spectral count data matrix with time course and subcellular localization factors. wk, weeks.
F<sc>ig</sc>. 2.
Fig. 2.
The number of true positive proteins (from the total of 200) identified by QSpec and PLGEM-StN at fixed FDRs in synthetic data sets with known -fold changes and using different number of replicates. A, QSpec, 2-fold change. B, PLGEM-StN, 2-fold change. C, QSpec, 4-fold change. D, PLGEM-StN, 4-fold change. rep, replicate(s).
F<sc>ig</sc>. 3.
Fig. 3.
The proportion of true positive proteins (sensitivity of identification) identified by QSpec in the synthetic data sets with 2-fold change (A) and 4-fold change (B) across the range of protein abundance. (xy) implies counts ranging from x to y. Rep, replicate.
F<sc>ig</sc>. 4.
Fig. 4.
Venn diagram of the selected proteins from QSpec with all 1508 proteins and PLGEM-StN with the subset of 511 proteins (27). Tables A and B correspond to the significantly enriched gene ontology terms in the protein list identified by QSpec and PLGEM-StN, respectively.
F<sc>ig</sc>. 5.
Fig. 5.
Selected proteins and functional annotation in the mouse mutant model data set. A, clustered time course graphs by time points and organelles. Time points (T1, T2, and T3) correspond to week 8, 16, and 24, respectively. B, heat map of differential expression in the nine categories by time point and organelle. Yellow indicates overexpression in the PLN R9C mutant relative to the wild type, and blue indicates underexpression. Gene ontology terms with FDR-adjusted p value less than 0.05 are reported. dw, down; w, weeks.

References

    1. Domon, B., and Aebersold, R. ( 2006) Mass spectrometry and protein analysis. Science 312, 212–217 - PubMed
    1. Nesvizhskii, A. I., Vitek, O., and Aebersold, R. ( 2007) Analysis and validation of proteomic data generated by tandem mass spectrometry. Nat. Methods 4, 787–797 - PubMed
    1. Gygi, S. P., Rist, B., Gerber, S. A., Turecek, F. Gelb, M. H., and Aebersold, R. ( 1999) Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nat. Biotechnol. 17, 994–999 - PubMed
    1. Ong, S. E., Blagoev, B., Kratchmarova, I., Kristensen, D. B., Steen, H., Pandey, A., and Mann, M. ( 2002) Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Mol. Cell. Proteomics 1, 376–386 - PubMed
    1. Ross, P. L., Huang, Y. N., Marchese, J. N., Williamson, B., Parker, K., Hattan, S., Khainovski, N., Pillai, S., Dey, S., Daniels, S., Purkayastha, S., Juhasz, P., Martin, S., Bartlet-Jones, M., He, F., Jacobson, A., and Pappin, D. J. ( 2004) Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents. Mol. Cell. Proteomics 3, 1154–1169 - PubMed

Publication types

Substances

LinkOut - more resources