Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Apr 5;9(4):1716-26.
doi: 10.1021/pr900850m.

TagRecon: high-throughput mutation identification through sequence tagging

Affiliations

TagRecon: high-throughput mutation identification through sequence tagging

Surendra Dasari et al. J Proteome Res. .

Abstract

Shotgun proteomics produces collections of tandem mass spectra that contain all the data needed to identify mutated peptides from clinical samples. Identifying these sequence variations, however, has not been feasible with conventional database search strategies, which require exact matches between observed and expected sequences. Searching for mutations as mass shifts on specified residues through database search can incur significant performance penalties and generate substantial false positive rates. Here we describe TagRecon, an algorithm that leverages inferred sequence tags to identify unanticipated mutations in clinical proteomic data sets. TagRecon identifies unmodified peptides as sensitively as the related MyriMatch database search engine. In both LTQ and Orbitrap data sets, TagRecon outperformed state of the art software in recognizing sequence mismatches from data sets with known variants. We developed guidelines for filtering putative mutations from clinical samples, and we applied them in an analysis of cancer cell lines and an examination of colon tissue. Mutations were found in up to 6% of identified peptides, and only a small fraction corresponded to dbSNP entries. The RKO cell line, which is DNA mismatch repair deficient, yielded more mutant peptides than the mismatch repair proficient SW480 line. Analysis of colon cancer tumor and adjacent tissue revealed hydroxyproline modifications associated with extracellular matrix degradation. These results demonstrate the value of using sequence tagging algorithms to fully interrogate clinical proteomic data sets.

PubMed Disclaimer

Figures

Figure 1
Figure 1
This flowchart illustrates the computational pipeline used to identify peptides from LC-MS/MS experiments. Myri-Match is a database search engine. TagRecon is a mutation-tolerant search engine, which reconciles partial sequence tags generated by DirecTag against a protein database. IDPicker is a parsimonious protein assembler, which filters peptide identifications using a target FDR.
Figure 2
Figure 2
The flowchart illustrates how TagRecon reconciles mass differences between database peptides and spectrum sequence tags as single amino acid mutations.
Figure 3
Figure 3
These images compare the identification performance of MyriMatch, InsPecT, and TagRecon on multiple replicates of yeast from LTQ and Orbitrap instruments. Each row in the graph indicates a specific algorithm and configuration for the search of multiple replicates. “FT” reports the use of a fully tryptic search as opposed to an “ST” or semitryptic search. “PM” indicates that precursor mass filtering was used to select candidate peptides for comparison to the spectrum rather than “FM” or flanking mass filtering. TagRecon outperformed MyriMatch on the LTQ replicates but fell behind by a small margin on Orbitrap data. In both instruments, however, TagRecon achieved a larger number of identifications than did InsPecT at the same 2% FDR.
Figure 4
Figure 4
These images compare the mutation identification performance of TagRecon, InsPecT, Paragon, and X! Tandem when using multiple replicates of yeast samples analyzed on LTQ and LTQ-Orbi instruments. MS/MS were matched to sequence databases containing simulated mutations. All databases contained the same numbers of simulated mutations. “SGD” indicates that mutations were contained in a larger SGD ORF database. “SGD-SS” indicates that same mutations were restricted to a smaller subset of SGD database. “TP” indicates that software reported a mutation present in the simulated mutation database. “FP” indicates otherwise. In both samples, TagRecon identified more true positive mutations than any other software at 2% FDR. All search engines improved performance when looking for mutations present in a smaller subset database. Accurate precursor masses from LTQ-Orbi improved true positive mutation identification.
Figure 5
Figure 5
These images compare the mutant peptide overlap between TagRecon, InsPecT, and Paragon when analyzing data for an individual sample of the colon tissue data set. Mutant peptides were attested following the stringent attestation guidelines outlined in Materials and Methods. The numbers in parentheses represent the spectral counts. TagRecon recognized larger numbers of mutations that passed attestation criteria and provided hints for many other possible mutations. No single search engine can identify all mutant peptides present in a sample.

References

    1. Bern M, Goldberg D, McDonald WH, Yates JR. Bioinformatics. 2004;20(Suppl 1):49–54. - PubMed
    1. Nesvizhskii AI, Roos FF, Grossmann J, Vogelzang M, Eddes JS, Gruissem W, Baginsky S, Aebersold R. Mol Cell Proteomics. 2006;5:652–670. - PubMed
    1. Bacolod MD, Schemmann GS, Giardina SF, Paty P, Notterman DA, Barany F. Cancer Res. 2009;69:723–727. - PMC - PubMed
    1. Zhao G, Yang F, Yuan Y, Gao X, Zhang J. Yichuan. 2005;27:123–129. - PubMed
    1. Nedelkov D. Expert Rev Proteomics. 2005;2:315–324. - PubMed

Publication types

LinkOut - more resources