Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Apr 5;88(7):3990-7.
doi: 10.1021/acs.analchem.6b00261. Epub 2016 Mar 14.

UVnovo: A de Novo Sequencing Algorithm Using Single Series of Fragment Ions via Chromophore Tagging and 351 nm Ultraviolet Photodissociation Mass Spectrometry

Affiliations

UVnovo: A de Novo Sequencing Algorithm Using Single Series of Fragment Ions via Chromophore Tagging and 351 nm Ultraviolet Photodissociation Mass Spectrometry

Scott A Robotham et al. Anal Chem. .

Abstract

De novo peptide sequencing by mass spectrometry represents an important strategy for characterizing novel peptides and proteins, in which a peptide's amino acid sequence is inferred directly from the precursor peptide mass and tandem mass spectrum (MS/MS or MS(3)) fragment ions, without comparison to a reference proteome. This method is ideal for organisms or samples lacking a complete or well-annotated reference sequence set. One of the major barriers to de novo spectral interpretation arises from confusion of N- and C-terminal ion series due to the symmetry between b and y ion pairs created by collisional activation methods (or c, z ions for electron-based activation methods). This is known as the "antisymmetric path problem" and leads to inverted amino acid subsequences within a de novo reconstruction. Here, we combine several key strategies for de novo peptide sequencing into a single high-throughput pipeline: high-efficiency carbamylation blocks lysine side chains, and subsequent tryptic digestion and N-terminal peptide derivatization with the ultraviolet chromophore AMCA yield peptides susceptible to 351 nm ultraviolet photodissociation (UVPD). UVPD-MS/MS of the AMCA-modified peptides then predominantly produces y ions in the MS/MS spectra, specifically addressing the antisymmetric path problem. Finally, the program UVnovo applies a random forest algorithm to automatically learn from and then interpret UVPD mass spectra, passing results to a hidden Markov model for de novo sequence prediction and scoring. We show this combined strategy provides high-performance de novo peptide sequencing, enabling the de novo sequencing of thousands of peptides from an Escherichia coli lysate at high confidence.

PubMed Disclaimer

Figures

Figure 1
Figure 1
a) Workflow for carbamylation/AMCA modification, b) carbamylation reaction.
Figure 2
Figure 2
UVnovo workflow for de novo sequencing. Spectra are divided into training and test sets. A random forest, trained on known spectra, transforms an unknown spectrum into a simplified representation of peptide cleavage site probabilities. At each position in this ‘simplified spectrum’, a hidden Markov model (HMM) refines the probability, also incorporating amino acid frequencies and requiring valid mass transitions. The best valid path through the HMM yields the de novo sequence prediction, and the individual fragmentation site probabilities provide a means to score each sequence.
Figure 3
Figure 3
UVPD (3 mJ per pulse, 15 pulses) mass spectrum of Elongation factor G peptide V[AMCA]YSGVVNSGDTVLNSVK[carbamyl]AAR (2+) from a trypsin-digested E. coli lysate. The precursor is labeled with an asterisk.
Figure 4
Figure 4
UVnovo de novo results for the E. coli lysate test set. A correct sequence matches the SEQUEST PSM exactly with no gaps. UVnovo scores each sequence reconstruction and ranks it relative to others from the same spectrum. a) Number of correct sequences versus peptide length for the top-ranked de novo result and for the top three de novo results. b) Fraction of correct sequences versus de novo rank. c, d) Filtering of low scoring de novo predictions improves sequence-level precision. 5062 of the original 7911 spectra remain, and over 75% of those removed had no correct match.
Figure 5
Figure 5
UVnovo performance for the E. coli lysate de novo reconstructions. a) Amino acid error versus peptide length for top-ranked de novo sequences from the filtered set of higher-confidence predictions. Most sequences are correct with no insertions or deletions. Incorrect sequences tend to diverge from SEQUEST PSMs by only 2 residues (a single fragmentation site misprediction). Histograms show fractional counts in each dimension. b,c) Amino acid precision-recall for the complete and filtered de novo results. AAs are pooled and sorted by residue-level score from (blue) the top-ranked de novo predictions for each spectrum or (dashed red) the best match among the top 3 predictions for each spectrum.
Figure 6
Figure 6
Co-eluting E. coli peptides are independently identified between UVnovo and SEQUEST. a) UVnovo and SEQUEST both assign the sequence EVEGFGEVFR. b) Spectrum is acquired 49 seconds after (a). Here, UVnovo assigns PVNIDIQTIR, conflicting with the SEQUEST identification, EVEGFGEVFR. Both sequences are present within the E. coli reference database.

Similar articles

Cited by

References

    1. Ma B, Johnson R. Mol Cell Proteomics. 2012;11(2) - PMC - PubMed
    1. Seidler J, Zinn N, Boehm ME, Lehmann WD. PROTEOMICS. 2010;10(4):634–649. - PubMed
    1. Mitchell Wells J, McLuckey SA. In: Methods in Enzymology. Burlingame AL, editor. Vol. 402. Academic Press; 2005. pp. 148–185. - PubMed
    1. Laskin J, Futrell JH. Mass Spectrom Rev. 2003;22(3):158–181. - PubMed
    1. Olsen JV, Macek B, Lange O, Makarov A, Horning S, Mann M. Nat Meth. 2007;4(9):709–712. - PubMed

Publication types