Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Aug 3;11(8):4191-200.
doi: 10.1021/pr300312h. Epub 2012 Jul 2.

Constrained de novo sequencing of conotoxins

Affiliations

Constrained de novo sequencing of conotoxins

Swapnil Bhatia et al. J Proteome Res. .

Abstract

De novo peptide sequencing by mass spectrometry (MS) can determine the amino acid sequence of an unknown peptide without reference to a protein database. MS-based de novo sequencing assumes special importance in focused studies of families of biologically active peptides and proteins, such as hormones, toxins, and antibodies, for which amino acid sequences may be difficult to obtain through genomic methods. These protein families often exhibit sequence homology or characteristic amino acid content; yet, current de novo sequencing approaches do not take advantage of this prior knowledge and, hence, search an unnecessarily large space of possible sequences. Here, we describe an algorithm for de novo sequencing that incorporates sequence constraints into the core graph algorithm and thereby reduces the search space by many orders of magnitude. We demonstrate our algorithm in a study of cysteine-rich toxins from two cone snail species (Conus textile and Conus stercusmuscarum) and report 13 de novo and about 60 total toxins.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Search space size
The four curves show peptides with: no constraint, a 4C constraint (must contain 4 cysteines), a simple motif constraint, and a more detailed motif constraint from ProSite, C - C - [S H Y N] - × (0, 1) - [P R G] - [R P A T V] - C - [A R M F T N H G] - × (0, 4) - [Q W H D G E N F Y V P] - [R I V Y L G S D W] - C. Here ×(0,4) means a sequence of 0 to 4 residues of any type, and [P R G] means one residue chosen from the set { P, R, G }.
Figure 2
Figure 2. Algorithm for constrained de novo sequencing
Panel (a) shows in black a hypothetical MS/MS spectrum of a peptide with singly charged precursor mass 679 Da. A standard step in de novo sequencing complements each observed peak and adds the artificial green peaks. If the original peak represents a y-ion, the complement peak represents a b-ion. Panel (b) shows a directed graph in which nodes represent peaks from the MS/MS spectrum, and arcs represent either one or two amino acid residues. A path of arcs from the leftmost to the rightmost node defines one or more candidate peptides. The constrained graph in panel (c) builds in the requirement that the candidate contain at least two cysteines: an acceptable path in graph (b) must also complete a left-to-right path in the constraint graph, where X denotes any amino acid residue and X \ C denotes any residue except cysteine. For example, the partial path GC corresponds to node 218 in (b) and 1Cys in (c), and can be completed (red) to give candidate GCPCW. Panel (d) shows that candidates satisfying the constraints (red) constitute only a small fraction of all the best candidates (red and black), so it is advantageous to generate and score only the constraint-satisfying peptides. The final step in de novo sequencing (e) scores the generated candidates using detailed spectrum features that cannot be easily incorporated into the graph algorithm. In this hypothetical example, the candidate GCCPW did not score well, because the position of proline is not consistent with the lack of a peak at 302 for the y-ion CPW+ and the strong peaks at 218, 258, and 462 for GC+, PC+ (an internal fragment), and PCW+.
Figure 3
Figure 3. CID spectrum of a C. textile toxin sequenced de novo
This toxin belongs to the M superfamily and is two mutations away from the closest database sequence.
Figure 4
Figure 4. CID spectrum of a C. stercusmuscarum toxin sequenced de novo
This toxin belongs to the M superfamily and was sequenced by a combination of spectra, including ones shown in the Supplemental Information. The order of the three initial residues is uncertain, but Byonic’s scorer prefers APA over AAP and PAA in order to explain the lack of cleavage at b2 / y22. In the cleavage diagram, a green stroke indicates b- and y-ions observed primarily doubly charged.

Similar articles

Cited by

References

    1. Eng J, McCormack AL, Yates JR. An approach to correlate tandem mass spectra data of peptides with amino acid sequences in a protein database. Journal of the American Society for Mass Spectrometry. 1994;5:976–989. - PubMed
    1. Perkins DN, Pappin DJ, Creasy DM, Cottrell JS. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis. 1999;20:3551–3567. - PubMed
    1. Ma B, et al. PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid Commun Mass Spectrom. 2003;17:2337–2342. - PubMed
    1. Datta R, Bern M. Spectrum fusion: using multiple mass spectra for de novo peptide sequencing. Journal of Computational Biology. 2009;16:1–14. - PubMed
    1. Mann M, Wilm M. Error-tolerant identification of peptides in sequence databases by peptide sequence tags. Analytical chemistry. 1994;66:4390–4399. - PubMed

Publication types