IDPicker 2.0: Improved protein assembly with high discrimination peptide identification filtering

Ze-Qiang Ma¹, Surendra Dasari, Matthew C Chambers, Michael D Litton, Scott M Sobecki, Lisa J Zimmerman, Patrick J Halvey, Birgit Schilling, Penelope M Drake, Bradford W Gibson, David L Tabb

Affiliations

PMID: 19522537
PMCID: PMC2810655
DOI: 10.1021/pr900360j

IDPicker 2.0: Improved protein assembly with high discrimination peptide identification filtering

Ze-Qiang Ma et al. J Proteome Res. 2009 Aug.

. 2009 Aug;8(8):3872-81.

doi: 10.1021/pr900360j.

Authors

Ze-Qiang Ma¹, Surendra Dasari, Matthew C Chambers, Michael D Litton, Scott M Sobecki, Lisa J Zimmerman, Patrick J Halvey, Birgit Schilling, Penelope M Drake, Bradford W Gibson, David L Tabb

Affiliation

¹ Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee 37232-8340, USA.

PMID: 19522537
PMCID: PMC2810655
DOI: 10.1021/pr900360j

Abstract

Tandem mass spectrometry-based shotgun proteomics has become a widespread technology for analyzing complex protein mixtures. A number of database searching algorithms have been developed to assign peptide sequences to tandem mass spectra. Assembling the peptide identifications to proteins, however, is a challenging issue because many peptides are shared among multiple proteins. IDPicker is an open-source protein assembly tool that derives a minimum protein list from peptide identifications filtered to a specified False Discovery Rate. Here, we update IDPicker to increase confident peptide identifications by combining multiple scores produced by database search tools. By segregating peptide identifications for thresholding using both the precursor charge state and the number of tryptic termini, IDPicker retrieves more peptides for protein assembly. The new version is more robust against false positive proteins, especially in searches using multispecies databases, by requiring additional novel peptides in the parsimony process. IDPicker has been designed for incorporation in many identification workflows by the addition of a graphical user interface and the ability to read identifications from the pepXML format. These advances position IDPicker for high peptide discrimination and reliable protein assembly in large-scale proteomics studies. The source code and binaries for the latest version of IDPicker are available from http://fenchurch.mc.vanderbilt.edu/ .

PubMed Disclaimer

Figures

**Figure 1**
Robust protein assembly for high sequence homology database searches. In this diagram, seven peptides observed in human serum are associated with the ceruplasmin sequence from three different species. Most protein assembly tools would include all three proteins because each is associated with at least two peptides, with at least one peptide being unique to each protein sequence. IDPicker, however, is able to screen out the mouse and rat sequences by requiring proteins to explain more than one new peptide for inclusion in the final list. The two peptides starting with “MYYS” differ at the fifth amino acid; this sequence difference probably reflects that the serum used in this study was a pool, reflecting the variant sequences of a population of blood donors. The two peptides starting with “MFTT”, on the other hand, are isobaric; the differing sequences “DQ” and “EN” are exactly the same mass. Of all the y ions generated by the two sequences starting with “MFTT”, only y4 could distinguish the peptides.

**Figure 2**
A Screenshot of IDPicker GUI report. Three samples from cancer subjects and three samples from control subjects were arranged in a tree hierarchy to reflect their biological meaning. Each sample has three replicate LC-MS/MS experiments that were grouped together. The final protein identification report arranges the protein, peptide, and spectral identifications in the above-described hierarchy. The numbers of identifications at each node are reported by summarizing the identifications of its child nodes. For example, the above report starts with the “root” level of hierarchy, designated by the “/” label, that summarizes all identifications present in the analysis. Following the root node, the numbers of identifications for next lower level hierarchies (cancer and control groups) are summarized, followed by each sample and individual technical replicate. The report also contains a navigation frame (shown on the left side) that allows the user to browse the protein identifications using different indices. Users can also manually validate the spectral matches using a built-in spectrum viewer. For example, the bottom window highlights the fragment ion matches of a tandem mass spectrum that was mapped to the peptide “IAQWQSFQLEGGLK”.

**Figure 3**
Combining multiple scores from a search engine improves peptide identification rate. Tandem mass spectra from three different samples were matched to IPI human protein database (version 3.47) using MyriMatch, Sequest, and Mascot search engines (see Materials and Methods for additional details). Peptide identifications from all search engines were loaded into IDPicker. For each search engine, IDPicker was configured to use either its primary score or a combination of its scores to identify peptides at an FDR ≤5%. Panels A–I show the percent overlap between valid peptide identifications when IDPicker was using either a single score or multiple scores from respective search engines. Combining multiple scores from a search engine yielded more peptide identifications from all samples. There were few peptide identifications that were identified only when using the primary score of a search engine but not the score combination.

**Figure 4**
Comparison of IDPicker and PeptideProphet score combination methods. Tandem mass spectra from three different samples were matched to the IPI human protein database (version 3.47) using the Sequest search engine (see Materials and Methods). The search results were separately processed by IDPicker and PeptideProphet. Both algorithms were configured to filter PSMs using a 5% FDR threshold. The total number of confident PSMs identified by both algorithms in each replicate of all three data sets is shown above. IDPicker produced more confident PSMs than PeptideProphet in some data sets and vice versa, but the algorithms performed similarly in all data sets, with a maximum difference of 5.7%. The simple nonparametric score combination method implemented in IDPicker performs as well as the complex probabilistic frameworks implemented in PeptideProphet, but the IDPicker score combination method can be more easily extended to combine multiple search scores from new search engines.

**Figure 5**
Partitioning peptides based on charge state (Z) and number of tryptic termini (NTT) improves peptide identification. “DLD1 LTQ” and “Serum Orbi” samples were matched to the IPI human protein database (version 3.47) using three search strategies: fully tryptic, semitryptic and unconstrained. Peptide identifications from each search were loaded into IDPicker and partitioned into and separate classes using four different methods shown in the figure. The average numbers of peptide identifications that have an FDR ≤ 5% when using a particular partition method are computed using reverse sequences present in the database and plotted for “DLD1 LTQ” (A) and “Serum Orbi” (B) data sets separately. The error bars in A and B represent the standard deviations from the replicates. Separating peptide identifications based on NTT and Z state improved the number of identified peptides in semitryptic and unconstrained searches. Improvement of peptide identification rate in the fully tryptic search (NTT = 2) is due to Z state only.

**Figure 6**
Reduction of orthologous protein identifications in a multispecies database search. Two different human samples (“DLD1 LTQ” and “Serum Orbi”) were matched to the Swiss-Prot multispecies database (version 56.2) using MyriMatch. Protein groups (containing indistinguishable proteins) were assembled from peptide identifications using IDPicker. Three different settings were used for the “Minimum additional peptides per protein group” filter in the assembly process. At each setting, the total numbers of human and nonhuman protein groups were enumerated and plotted for both “DLD1 LTQ” (A) and “Serum Orbi” (B) samples. Setting the filter to 2 dramatically reduced the number of nonhuman (orthologous) protein identifications from a multispecies database search without significantly affecting the number of human (paralogous) protein identifications.

See this image and copyright information in PMC

References

1. Aebersold R, Mann M. Mass spectrometry-based proteomics. Nature. 2003;422(6928):198–207. - PubMed
1. Domon B, Aebersold R. Mass spectrometry and protein analysis. Science. 2006;312(5771):212–217. - PubMed
1. Tabb DL, Fernando CG, Chambers MC. MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. J. Proteome Res. 2007;6(2):654–661. - PMC - PubMed
1. Eng JKM,AL, Yates JR. An approach to correlate tandem mass-spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 1994;(5):976–989. - PubMed
1. Perkins DN, Pappin DJ, Creasy DM, Cottrell JS. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis. 1999;20(18):3551–3567. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

IDPicker 2.0: Improved protein assembly with high discrimination peptide identification filtering

Affiliation

IDPicker 2.0: Improved protein assembly with high discrimination peptide identification filtering

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources