Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Apr 22:12:115.
doi: 10.1186/1471-2105-12-115.

The proteogenomic mapping tool

Affiliations

The proteogenomic mapping tool

William S Sanders et al. BMC Bioinformatics. .

Abstract

Background: High-throughput mass spectrometry (MS) proteomics data is increasingly being used to complement traditional structural genome annotation methods. To keep pace with the high speed of experimental data generation and to aid in structural genome annotation, experimentally observed peptides need to be mapped back to their source genome location quickly and exactly. Previously, the tools to do this have been limited to custom scripts designed by individual research groups to analyze their own data, are generally not widely available, and do not scale well with large eukaryotic genomes.

Results: The Proteogenomic Mapping Tool includes a Java implementation of the Aho-Corasick string searching algorithm which takes as input standardized file types and rapidly searches experimentally observed peptides against a given genome translated in all 6 reading frames for exact matches. The Java implementation allows the application to scale well with larger eukaryotic genomes while providing cross-platform functionality.

Conclusions: The Proteogenomic Mapping Tool provides a standalone application for mapping peptides back to their source genome on a number of operating system platforms with standard desktop computer hardware and executes very rapidly for a variety of datasets. Allowing the selection of different genetic codes for different organisms allows researchers to easily customize the tool to their own research interests and is recommended for anyone working to structurally annotate genomes using MS derived proteomics data.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Proteogenomic Mapping Pipeline Windows GUI. The proteogenomic mapping pipeline requires three files from the user and offers several options. a. First, the user must provide a FASTA formatted file specifying the proteins for which to search. b. The user also supplies a FASTA formatted file specifying the genome in which to search for the peptides. The file can contain the entire genome as one large entry or multiple entries containing only selected features of interest. For example, the file may contain all exons for an organism. c. The user then selects an output file. Two files will be created. The file selected by the user will contain detailed information about the mapping. An additional FASTA file, with ".fasta" appended the name as the file selected by the user, contains the ePST sequence in FASTA format. d. The user can select to ignore splice sites or to use canonical splice sites when searching upstream for the start of the ePST sequence and downstream for the stop of the sequence. e. A genetic code table file, which specifies the mapping from codons to amino acids as well as start and stop codons, must also be provided. f. Because the code table file can contain multiple mappings, the desired mapping must be selected.
Figure 2
Figure 2
Prokaryotic ePST Generation Process. a. Map the peptide to the translated genome. b. Extend the mapped peptide in the 3' direction to an in-frame stop codon. c. Extend the mapped peptide in the 5' direction to an in-frame stop codon. d. From this 5' in-frame stop codon, proceed in a 3' direction to identify an in-frame start codon. e. Final ePST. f. Generate translated ePST sequence.
Figure 3
Figure 3
Eukaryotic ePST Generation Process. a. Options 1 & 2: Map the peptide to the translated genome. b. Option 1: Extend the mapped peptide in the 3' direction to an in-frame stop codon or splice site boundary. Option 2: Extend the mapped peptide in the 3' direction the number of codons selected by the user. c. Option 1: Extend the mapped peptide in the 5' direction to an in-frame stop codon or start codon, or splice site boundary. Option 2: Extend the mapped peptide in the 5' direction the number of codons selected by the user. d. Final ePST. e. Generate translated ePST sequence.

References

    1. Jaffe JD, Berg HC, Church GM. Proteogenomic mapping as a complementary method to perform genome annotation. Proteomics. 2004;4:59–77. doi: 10.1002/pmic.200300511. - DOI - PubMed
    1. McCarthy FM, Cooksey AM, Wang N, Bridges SM, Pharr GT, Burgess SC. Modeling a whole organ using proteomics: the avian bursa of Fabricius. Proteomics. 2006;6:2759–2771. doi: 10.1002/pmic.200500648. - DOI - PubMed
    1. Castellana NE, Payne SH, Shen Z, Stanke M, Bafna V, Briggs SP. Discovery and revision of Arabidopsis genes by proteogenomics. Proc Natl Acad Sci USA. 2008;105:21034–21038. doi: 10.1073/pnas.0811066106. - DOI - PMC - PubMed
    1. Sevinsky JR, Cargile BJ, Bunger MK, Meng F, Yates NA, Hendrickson RC, Stephenson JL Jr. Whole genome searching with shotgun proteomic data: applications for genome annotation. J Proteome Res. 2008;7:80–88. doi: 10.1021/pr070198n. - DOI - PubMed
    1. Kunec D, Nanduri B, Burgess SC. Experimental annotation of channel catfish virus by probabilistic proteogenomic mapping. Proteomics. 2009;9:2634–2647. doi: 10.1002/pmic.200800397. - DOI - PubMed

Publication types