TagRecon: high-throughput mutation identification through sequence tagging

Surendra Dasari¹, Matthew C Chambers, Robbert J Slebos, Lisa J Zimmerman, Amy-Joan L Ham, David L Tabb

Affiliations

PMID: 20131910
PMCID: PMC2859315
DOI: 10.1021/pr900850m

TagRecon: high-throughput mutation identification through sequence tagging

Surendra Dasari et al. J Proteome Res. 2010.

. 2010 Apr 5;9(4):1716-26.

doi: 10.1021/pr900850m.

Authors

Surendra Dasari¹, Matthew C Chambers, Robbert J Slebos, Lisa J Zimmerman, Amy-Joan L Ham, David L Tabb

Affiliation

¹ Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee 37232-8340, USA.

PMID: 20131910
PMCID: PMC2859315
DOI: 10.1021/pr900850m

Abstract

Shotgun proteomics produces collections of tandem mass spectra that contain all the data needed to identify mutated peptides from clinical samples. Identifying these sequence variations, however, has not been feasible with conventional database search strategies, which require exact matches between observed and expected sequences. Searching for mutations as mass shifts on specified residues through database search can incur significant performance penalties and generate substantial false positive rates. Here we describe TagRecon, an algorithm that leverages inferred sequence tags to identify unanticipated mutations in clinical proteomic data sets. TagRecon identifies unmodified peptides as sensitively as the related MyriMatch database search engine. In both LTQ and Orbitrap data sets, TagRecon outperformed state of the art software in recognizing sequence mismatches from data sets with known variants. We developed guidelines for filtering putative mutations from clinical samples, and we applied them in an analysis of cancer cell lines and an examination of colon tissue. Mutations were found in up to 6% of identified peptides, and only a small fraction corresponded to dbSNP entries. The RKO cell line, which is DNA mismatch repair deficient, yielded more mutant peptides than the mismatch repair proficient SW480 line. Analysis of colon cancer tumor and adjacent tissue revealed hydroxyproline modifications associated with extracellular matrix degradation. These results demonstrate the value of using sequence tagging algorithms to fully interrogate clinical proteomic data sets.

PubMed Disclaimer

Figures

**Figure 1**
This flowchart illustrates the computational pipeline used to identify peptides from LC-MS/MS experiments. Myri-Match is a database search engine. TagRecon is a mutation-tolerant search engine, which reconciles partial sequence tags generated by DirecTag against a protein database. IDPicker is a parsimonious protein assembler, which filters peptide identifications using a target FDR.

**Figure 2**
The flowchart illustrates how TagRecon reconciles mass differences between database peptides and spectrum sequence tags as single amino acid mutations.

**Figure 3**
These images compare the identification performance of MyriMatch, InsPecT, and TagRecon on multiple replicates of yeast from LTQ and Orbitrap instruments. Each row in the graph indicates a specific algorithm and configuration for the search of multiple replicates. “FT” reports the use of a fully tryptic search as opposed to an “ST” or semitryptic search. “PM” indicates that precursor mass filtering was used to select candidate peptides for comparison to the spectrum rather than “FM” or flanking mass filtering. TagRecon outperformed MyriMatch on the LTQ replicates but fell behind by a small margin on Orbitrap data. In both instruments, however, TagRecon achieved a larger number of identifications than did InsPecT at the same 2% FDR.

**Figure 4**
These images compare the mutation identification performance of TagRecon, InsPecT, Paragon, and X! Tandem when using multiple replicates of yeast samples analyzed on LTQ and LTQ-Orbi instruments. MS/MS were matched to sequence databases containing simulated mutations. All databases contained the same numbers of simulated mutations. “SGD” indicates that mutations were contained in a larger SGD ORF database. “SGD-SS” indicates that same mutations were restricted to a smaller subset of SGD database. “TP” indicates that software reported a mutation present in the simulated mutation database. “FP” indicates otherwise. In both samples, TagRecon identified more true positive mutations than any other software at 2% FDR. All search engines improved performance when looking for mutations present in a smaller subset database. Accurate precursor masses from LTQ-Orbi improved true positive mutation identification.

**Figure 5**
These images compare the mutant peptide overlap between TagRecon, InsPecT, and Paragon when analyzing data for an individual sample of the colon tissue data set. Mutant peptides were attested following the stringent attestation guidelines outlined in Materials and Methods. The numbers in parentheses represent the spectral counts. TagRecon recognized larger numbers of mutations that passed attestation criteria and provided hints for many other possible mutations. No single search engine can identify all mutant peptides present in a sample.

See this image and copyright information in PMC

References

1. Bern M, Goldberg D, McDonald WH, Yates JR. Bioinformatics. 2004;20(Suppl 1):49–54. - PubMed
1. Nesvizhskii AI, Roos FF, Grossmann J, Vogelzang M, Eddes JS, Gruissem W, Baginsky S, Aebersold R. Mol Cell Proteomics. 2006;5:652–670. - PubMed
1. Bacolod MD, Schemmann GS, Giardina SF, Paty P, Notterman DA, Barany F. Cancer Res. 2009;69:723–727. - PMC - PubMed
1. Zhao G, Yang F, Yuan Y, Gao X, Zhang J. Yichuan. 2005;27:123–129. - PubMed
1. Nedelkov D. Expert Rev Proteomics. 2005;2:315–324. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

TagRecon: high-throughput mutation identification through sequence tagging

Affiliation

TagRecon: high-throughput mutation identification through sequence tagging

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Molecular Biology Databases