Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2012 Jan 10;109(2):407-12.
doi: 10.1073/pnas.1108399108. Epub 2011 Dec 22.

Database independent proteomics analysis of the ostrich and human proteome

Affiliations
Comparative Study

Database independent proteomics analysis of the ostrich and human proteome

A F Maarten Altelaar et al. Proc Natl Acad Sci U S A. .

Abstract

Mass spectrometry (MS)-based proteome analysis relies heavily on the presence of complete protein databases. Such a strategy is extremely powerful, albeit not adequate in the analysis of unpredicted postgenome events, such as posttranslational modifications, which exponentially increase the search space. Therefore, it is of interest to explore "database-free" approaches. Here, we sampled the ostrich and human proteomes with a method facilitating de novo sequencing, utilizing the protease Lys-N in combination with electron transfer dissociation. By implementing several validation steps, including the combined use of collision-induced dissociation/electron transfer dissociation data and a cross-validation with conventional database search strategies, we identified approximately 2,500 unique de novo peptide sequences from the ostrich sample with over 900 peptides generating full backbone sequence coverage. This dataset allowed the appropriate positioning of ostrich in the evolutionary tree. The described database-free sequencing approach is generically applicable and has great potential in important proteomics applications such as in the analysis of variable parts of endogenous antibodies or proteins modified by a plethora of complex posttranslational modifications.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Schematic overview of the de novo pipeline. After Lys-N protein digestion and SCX enrichment of peptides containing a single N-terminal lysine, a nanoliter flow liquid chromatography separation and a mass spectrometric analysis is performed. Each peptide is sequenced using the fragmentation techniques ETD and CID. The resulting spectra are, subsequently, read into the de novo algorithm, which performs noise filtering and de novo interpretation. From approximately 27,000 ETD spectra, this process resulted in more than 5.5 million possible sequence solutions for 11,183 spectra. The algorithm output consists of a library of all possible peptide solutions alongside the coordinates of the parent spectrum. To retrieve the best match between the ETD fragment spectrum and its reported de novo sequences, the de novo sequence library, alongside a decoy library of equivalent size, is uploaded into the identification engine Mascot that then performs its own matching process on the ETD data. To diminish false positive identifications further, the paired CID data are also matched by Mascot against the de novo solutions. The resulting matches are then filtered such that only results for an ETD spectrum obtained by Mascot originating from a de novo solution for the same spectrum are acceptable. The paired CID scan must also match the same solution(s). If multiple solutions match both the ETD and CID scans then the two scores for the same solution are combined and the highest-ranking result is taken forward. In this way, we can exploit the complementarity of the two fragmentation techniques. The entire process resulted in 8,890 de novo peptide solutions that represent an agreement between Mascot and the de novo algorithm as well as an agreement between ETD and CID. Collapsing the data further led to 2,744 unique nonredundant peptide sequences.
Fig. 2.
Fig. 2.
Performance characteristics of the Lys-N ETD de novo sequencing strategy. (A) Peptide identification results from a Mascot search of the Lys-N peptides derived from ostrich against the IPI chicken and the combined UniProt databases. The database search against IPI chicken resulted in the identification of 2,867 forward peptides (p < 0.05) and 959 decoy peptides (using the Mascot decoy strategy), leading to an FDR of 33.5%. Forcing the IPI chicken identification procedure to reach an FDR < 1% required stringent Mascot identification criteria (p value < 0.0001 and score > 44), reducing the number of unique peptide identifications 15-fold to 180. The database search against the UniProt database identified 2,384 forward peptides (p < 0.05) and 963 decoy peptides, resulting again in an FDR of 40%. Adjusting the p value to < 0.0001 resulted in an FDR of 10.1% at a Mascot score cutoff of > 55, and the identification of only 44 unique peptides. (B) Combined CID/ETD Mascot score plotted against the number of unique peptides identified (decoy hits depicted in green) for a single SCX fraction of Lys-N digested human HEK293 cells using the de novo sequencing identification strategy and (C) using a conventional IPI-human-based database search strategy. The results from the IPI human database showed increased false positive hits at lower Mascot ion scores (ion score < 30), whereas this number was significantly lower in the de novo strategy. (D) Comparison between the de novo workflow, as applied to the ostrich data, and a conventional database search against IPI human for a human HEK293 sample. The de novo workflow resulted in 1,097 paired CID/ETD queries generating the identification of 1,029 unique peptide sequences from this single SCX fraction. The common database search generated 1,492 unique peptides with an FDR of 1%. Both strategies had 745 CID/ETD pairwise queries in common, of which 183 peptide sequences agreed fully, representing a minimum success rate of 25%.
Fig. 3.
Fig. 3.
Phylogenetic analysis of the peptide dataset generated by the de novo approach. (A) Ensembl Compara-based alignment of concatenated de novo identified peptides (peptides are separated by X) with the sequence of four selected species at different evolutionary distance, i.e., chicken, zebra finch, lizard, and human. Because the de novo approach cannot distinguish between isoleucine and leucine, I is changed to L. All ostrich derived de novo peptides that map uniquely to a protein in one or more of these species are selected and of these peptides, the subset that mapped to a protein with exactly one orthologous sequence in the other species was used for the alignment. (B) Maximum likelihood tree of the concatenated multiple sequence alignments of the selected peptides with their orthologs. There was high bootstrap support for the distinct avian branch (67%), as well as for the distinct differentiation within the avian branch (85%). (C) Negative control. Maximum likelihood tree of the concatenated multiple sequence alignments of the selected peptides with their orthologs after randomizing the order of the residues in the peptides of our species of interest. There was high bootstrap support (85%) for incorrectly placing ostrich with lizard. (D) ETD MS/MS spectra of two different forms of the ostrich phosphopeptide KGILAADESTGSIA clearly revealed the site localization capabilities of this approach with a clear distinction between the isobaric phosphopeptides KGILAADESTGpSIA and KGILAADEpSTGSIA. (E) ETD MS/MS spectra of two lysine acetylated peptides from the protein ostrich l-lactate dehydrogenase A chain, whose human homologue is known to be heavily decorated with lysine acetylations and (F) sequence alignment of these peptides from ostrich with several other species showing the acetylated lysines (in red), the high conservation of the lysine found to be acetylated in our study, and a novel point mutation (in green).
Fig. 4.
Fig. 4.
Two examples of ETD and CID fragment spectra originating from the human HEK293 sample where the de novo solution scored significantly higher than the database result. The illustrated examples of high quality ETD and CID spectra produce de novo sequences that are neither present in the IPI human database nor show significant similarity with sequences from other species, as confirmed by BLAST searching.
Fig. 5.
Fig. 5.
Comparative assessment of three ETD fragment spectra originating from the human HEK293 sample and their synthetic de novo constructed peptide sequences. Synthetic peptides of the de novo predicted peptide sequences, unknown by IPI, were constructed and analyzed by ETD, the resulting fragment spectra are largely indistinguishable from the HEK293 experimental peptide spectra.

Similar articles

Cited by

References

    1. McLafferty FW. A century of progress in molecular mass spectrometry. Annu Rev Anal Chem. 2011;4:1–22. - PubMed
    1. Cox J, Mann M. Quantitative, high-resolution proteomics for data-driven systems biology. Annu Rev Biochem. 2011;80:273–299. - PubMed
    1. Sadygov RG, Cociorva D, Yates JR., 3rd Large-scale database searching using tandem mass spectra: Looking up the answer in the back of the book. Nat Methods. 2004;1:195–202. - PubMed
    1. Pappin DJ, Hojrup P, Bleasby AJ. Rapid identification of proteins by peptide-mass fingerprinting. Curr Biol. 1993;3:327–332. - PubMed
    1. Zubarev RA, Kelleher NL, McLafferty FW. Electron capture dissociation of multiply charged protein cations. A nonergodic process. J Am Chem Soc. 1998;120:3265–3266.

Publication types

LinkOut - more resources