Whole human genome proteogenomic mapping for ENCODE cell line data: identifying protein-coding regions
- PMID: 23448259
- PMCID: PMC3607840
- DOI: 10.1186/1471-2164-14-141
Whole human genome proteogenomic mapping for ENCODE cell line data: identifying protein-coding regions
Abstract
Background: Proteogenomic mapping is an approach that uses mass spectrometry data from proteins to directly map protein-coding genes and could aid in locating translational regions in the human genome. In concert with the ENcyclopedia of DNA Elements (ENCODE) project, we applied proteogenomic mapping to produce proteogenomic tracks for the UCSC Genome Browser, to explore which putative translational regions may be missing from the human genome.
Results: We generated ~1 million high-resolution tandem mass (MS/MS) spectra for Tier 1 ENCODE cell lines K562 and GM12878 and mapped them against the UCSC hg19 human genome, and the GENCODE V7 annotated protein and transcript sets. We then compared the results from the three searches to identify the best-matching peptide for each MS/MS spectrum, thereby increasing the confidence of the putative new protein-coding regions found via the whole genome search. At a 1% false discovery rate, we identified 26,472, 24,406, and 13,128 peptides from the protein, transcript, and whole genome searches, respectively; of these, 481 were found solely via the whole genome search. The proteogenomic mapping data are available on the UCSC Genome Browser at http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeUncBsuProt.
Conclusions: The whole genome search revealed that ~4% of the uniquely mapping identified peptides were located outside GENCODE V7 annotated exons. The comparison of the results from the disparate searches also identified 15% more spectra than would have been found solely from a protein database search. Therefore, whole genome proteogenomic mapping is a complementary method for genome annotation when performed in conjunction with other searches.
Figures





Similar articles
-
GENCODE: the reference human genome annotation for The ENCODE Project.Genome Res. 2012 Sep;22(9):1760-74. doi: 10.1101/gr.135350.111. Genome Res. 2012. PMID: 22955987 Free PMC article.
-
GENCODE: producing a reference annotation for ENCODE.Genome Biol. 2006;7 Suppl 1(Suppl 1):S4.1-9. doi: 10.1186/gb-2006-7-s1-s4. Epub 2006 Aug 7. Genome Biol. 2006. PMID: 16925838 Free PMC article.
-
ENCODE whole-genome data in the UCSC Genome Browser.Nucleic Acids Res. 2010 Jan;38(Database issue):D620-5. doi: 10.1093/nar/gkp961. Epub 2009 Nov 17. Nucleic Acids Res. 2010. PMID: 19920125 Free PMC article.
-
Proteogenomic Tools and Approaches to Explore Protein Coding Landscapes of Eukaryotic Genomes.Adv Exp Med Biol. 2016;926:1-10. doi: 10.1007/978-3-319-42316-6_1. Adv Exp Med Biol. 2016. PMID: 27686802 Review.
-
EGASP: the human ENCODE Genome Annotation Assessment Project.Genome Biol. 2006;7 Suppl 1(Suppl 1):S2.1-31. doi: 10.1186/gb-2006-7-s1-s2. Epub 2006 Aug 7. Genome Biol. 2006. PMID: 16925836 Free PMC article. Review.
Cited by
-
Translatomics: The Global View of Translation.Int J Mol Sci. 2019 Jan 8;20(1):212. doi: 10.3390/ijms20010212. Int J Mol Sci. 2019. PMID: 30626072 Free PMC article. Review.
-
Proteome sequencing goes deep.Curr Opin Chem Biol. 2015 Feb;24:11-7. doi: 10.1016/j.cbpa.2014.10.017. Epub 2014 Nov 8. Curr Opin Chem Biol. 2015. PMID: 25461719 Free PMC article. Review.
-
Proteogenomics produces comprehensive and highly accurate protein-coding gene annotation in a complete genome assembly of Malassezia sympodialis.Nucleic Acids Res. 2017 Mar 17;45(5):2629-2643. doi: 10.1093/nar/gkx006. Nucleic Acids Res. 2017. PMID: 28100699 Free PMC article.
-
Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation.Annu Rev Anal Chem (Palo Alto Calif). 2016 Jun 12;9(1):521-45. doi: 10.1146/annurev-anchem-071015-041722. Epub 2016 Mar 30. Annu Rev Anal Chem (Palo Alto Calif). 2016. PMID: 27049631 Free PMC article. Review.
-
ProteomeGenerator: A Framework for Comprehensive Proteomics Based on de Novo Transcriptome Assembly and High-Accuracy Peptide Mass Spectral Matching.J Proteome Res. 2018 Nov 2;17(11):3681-3692. doi: 10.1021/acs.jproteome.8b00295. Epub 2018 Oct 19. J Proteome Res. 2018. PMID: 30295032 Free PMC article.
References
-
- Birney E, Stamatoyannopoulos JA, Dutta A, Guigo R, Gingeras TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET, Thurman RE. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 2007;447:799–816. doi: 10.1038/nature05874. - DOI - PMC - PubMed
-
- Chaerkady R, Kelkar DS, Muthusamy B, Kandasamy K, Dwivedi SB, Sahasrabuddhe NA, Kim MS, Renuse S, Pinto SM, Sharma R. A proteogenomic analysis of Anopheles gambiae using high-resolution Fourier transform mass spectrometry. Genome Res. 2011;21(11):1872–1881. doi: 10.1101/gr.127951.111. - DOI - PMC - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous