Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Mar;18(3):283-93.
doi: 10.1089/cmb.2010.0261.

Proteome coverage prediction for integrated proteomics datasets

Affiliations

Proteome coverage prediction for integrated proteomics datasets

Manfred Claassen et al. J Comput Biol. 2011 Mar.

Abstract

Comprehensive characterization of a proteome defines a fundamental goal in proteomics. In order to maximize proteome coverage for a complex protein mixture, i.e., to identify as many proteins as possible, various different fractionation experiments are typically performed and the individual fractions are subjected to mass spectrometric analysis. The resulting data are integrated into large and heterogeneous datasets. Proteome coverage prediction refers to the task of extrapolating the number of protein discoveries by future measurements conditioned on a sequence of already performed measurements. Proteome coverage prediction at an early stage enables experimentalists to design and plan efficient proteomics studies. To date, there does not exist any method that reliably predicts proteome coverage from integrated datasets. We present a generalized hierarchical Pitman-Yor process model that explicitly captures the redundancy within integrated datasets. The accuracy of our approach for proteome coverage prediction is assessed by applying it to an integrated proteomics dataset for the bacterium L. interrogans. The proposed procedure outperforms ad hoc extrapolation methods and prediction methods designed for non-integrated datasets. Furthermore, the maximally achievable proteome coverage is estimated for the experimental setup underlying the L. interrogans dataset. We discuss the implications of our results for determining rational stop criteria and their influence on the design of efficient and reliable proteomics studies.

PubMed Disclaimer

Publication types

LinkOut - more resources