Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Aug;8(8):3872-81.
doi: 10.1021/pr900360j.

IDPicker 2.0: Improved protein assembly with high discrimination peptide identification filtering

Affiliations

IDPicker 2.0: Improved protein assembly with high discrimination peptide identification filtering

Ze-Qiang Ma et al. J Proteome Res. 2009 Aug.

Abstract

Tandem mass spectrometry-based shotgun proteomics has become a widespread technology for analyzing complex protein mixtures. A number of database searching algorithms have been developed to assign peptide sequences to tandem mass spectra. Assembling the peptide identifications to proteins, however, is a challenging issue because many peptides are shared among multiple proteins. IDPicker is an open-source protein assembly tool that derives a minimum protein list from peptide identifications filtered to a specified False Discovery Rate. Here, we update IDPicker to increase confident peptide identifications by combining multiple scores produced by database search tools. By segregating peptide identifications for thresholding using both the precursor charge state and the number of tryptic termini, IDPicker retrieves more peptides for protein assembly. The new version is more robust against false positive proteins, especially in searches using multispecies databases, by requiring additional novel peptides in the parsimony process. IDPicker has been designed for incorporation in many identification workflows by the addition of a graphical user interface and the ability to read identifications from the pepXML format. These advances position IDPicker for high peptide discrimination and reliable protein assembly in large-scale proteomics studies. The source code and binaries for the latest version of IDPicker are available from http://fenchurch.mc.vanderbilt.edu/ .

PubMed Disclaimer

Figures

Figure 1
Figure 1
Robust protein assembly for high sequence homology database searches. In this diagram, seven peptides observed in human serum are associated with the ceruplasmin sequence from three different species. Most protein assembly tools would include all three proteins because each is associated with at least two peptides, with at least one peptide being unique to each protein sequence. IDPicker, however, is able to screen out the mouse and rat sequences by requiring proteins to explain more than one new peptide for inclusion in the final list. The two peptides starting with “MYYS” differ at the fifth amino acid; this sequence difference probably reflects that the serum used in this study was a pool, reflecting the variant sequences of a population of blood donors. The two peptides starting with “MFTT”, on the other hand, are isobaric; the differing sequences “DQ” and “EN” are exactly the same mass. Of all the y ions generated by the two sequences starting with “MFTT”, only y4 could distinguish the peptides.
Figure 2
Figure 2
A Screenshot of IDPicker GUI report. Three samples from cancer subjects and three samples from control subjects were arranged in a tree hierarchy to reflect their biological meaning. Each sample has three replicate LC-MS/MS experiments that were grouped together. The final protein identification report arranges the protein, peptide, and spectral identifications in the above-described hierarchy. The numbers of identifications at each node are reported by summarizing the identifications of its child nodes. For example, the above report starts with the “root” level of hierarchy, designated by the “/” label, that summarizes all identifications present in the analysis. Following the root node, the numbers of identifications for next lower level hierarchies (cancer and control groups) are summarized, followed by each sample and individual technical replicate. The report also contains a navigation frame (shown on the left side) that allows the user to browse the protein identifications using different indices. Users can also manually validate the spectral matches using a built-in spectrum viewer. For example, the bottom window highlights the fragment ion matches of a tandem mass spectrum that was mapped to the peptide “IAQWQSFQLEGGLK”.
Figure 3
Figure 3
Combining multiple scores from a search engine improves peptide identification rate. Tandem mass spectra from three different samples were matched to IPI human protein database (version 3.47) using MyriMatch, Sequest, and Mascot search engines (see Materials and Methods for additional details). Peptide identifications from all search engines were loaded into IDPicker. For each search engine, IDPicker was configured to use either its primary score or a combination of its scores to identify peptides at an FDR ≤5%. Panels A–I show the percent overlap between valid peptide identifications when IDPicker was using either a single score or multiple scores from respective search engines. Combining multiple scores from a search engine yielded more peptide identifications from all samples. There were few peptide identifications that were identified only when using the primary score of a search engine but not the score combination.
Figure 4
Figure 4
Comparison of IDPicker and PeptideProphet score combination methods. Tandem mass spectra from three different samples were matched to the IPI human protein database (version 3.47) using the Sequest search engine (see Materials and Methods). The search results were separately processed by IDPicker and PeptideProphet. Both algorithms were configured to filter PSMs using a 5% FDR threshold. The total number of confident PSMs identified by both algorithms in each replicate of all three data sets is shown above. IDPicker produced more confident PSMs than PeptideProphet in some data sets and vice versa, but the algorithms performed similarly in all data sets, with a maximum difference of 5.7%. The simple nonparametric score combination method implemented in IDPicker performs as well as the complex probabilistic frameworks implemented in PeptideProphet, but the IDPicker score combination method can be more easily extended to combine multiple search scores from new search engines.
Figure 5
Figure 5
Partitioning peptides based on charge state (Z) and number of tryptic termini (NTT) improves peptide identification. “DLD1 LTQ” and “Serum Orbi” samples were matched to the IPI human protein database (version 3.47) using three search strategies: fully tryptic, semitryptic and unconstrained. Peptide identifications from each search were loaded into IDPicker and partitioned into and separate classes using four different methods shown in the figure. The average numbers of peptide identifications that have an FDR ≤ 5% when using a particular partition method are computed using reverse sequences present in the database and plotted for “DLD1 LTQ” (A) and “Serum Orbi” (B) data sets separately. The error bars in A and B represent the standard deviations from the replicates. Separating peptide identifications based on NTT and Z state improved the number of identified peptides in semitryptic and unconstrained searches. Improvement of peptide identification rate in the fully tryptic search (NTT = 2) is due to Z state only.
Figure 6
Figure 6
Reduction of orthologous protein identifications in a multispecies database search. Two different human samples (“DLD1 LTQ” and “Serum Orbi”) were matched to the Swiss-Prot multispecies database (version 56.2) using MyriMatch. Protein groups (containing indistinguishable proteins) were assembled from peptide identifications using IDPicker. Three different settings were used for the “Minimum additional peptides per protein group” filter in the assembly process. At each setting, the total numbers of human and nonhuman protein groups were enumerated and plotted for both “DLD1 LTQ” (A) and “Serum Orbi” (B) samples. Setting the filter to 2 dramatically reduced the number of nonhuman (orthologous) protein identifications from a multispecies database search without significantly affecting the number of human (paralogous) protein identifications.

References

    1. Aebersold R, Mann M. Mass spectrometry-based proteomics. Nature. 2003;422(6928):198–207. - PubMed
    1. Domon B, Aebersold R. Mass spectrometry and protein analysis. Science. 2006;312(5771):212–217. - PubMed
    1. Tabb DL, Fernando CG, Chambers MC. MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. J. Proteome Res. 2007;6(2):654–661. - PMC - PubMed
    1. Eng JKM,AL, Yates JR. An approach to correlate tandem mass-spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 1994;(5):976–989. - PubMed
    1. Perkins DN, Pappin DJ, Creasy DM, Cottrell JS. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis. 1999;20(18):3551–3567. - PubMed

Publication types