Proteogenomic mapping as a complementary method to perform genome annotation
- PMID: 14730672
- DOI: 10.1002/pmic.200300511
Proteogenomic mapping as a complementary method to perform genome annotation
Abstract
The accelerated rate of genomic sequencing has led to an abundance of completely sequenced genomes. Annotation of the open reading frames (ORFs) (i.e., gene prediction) in these genomes is an important task and is most often performed computationally based on features in the nucleic acid sequence. Using recent advances in proteomics, we set out to predict the set of ORFs for an organism based principally on expressed protein-based evidence. Using a novel search strategy, we mapped peptides detected in a whole-cell lysate of Mycoplasma pneumoniae onto a genomic scaffold and extended these "hits" into ORFs bound by traditional genetic signals to generate a "proteogenomic map". We were able to generate an ORF model for M. pneumoniae strain FH using proteomic data with a high correlation to models based on sequence features. Ultimately, we detected over 81% of the genomically predicted ORFs in M. pneumoniae strain M129 (the originally sequenced strain). We were also able to detect several new ORFs not originally predicted by genomic methods, various N-terminal extensions, and some evidence that would suggest that certain predicted ORFs are bogus. Some of these differences may be a result of the strain analyzed but demonstrate the robustness of protein analysis across closely related genomes. This technique is a cost-effective means to add value to genome annotation, and a prerequisite for proteome quantitation and in vivo interaction measures.
Similar articles
-
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].Yi Chuan Xue Bao. 2004 May;31(5):431-43. Yi Chuan Xue Bao. 2004. PMID: 15478601 Chinese.
-
Gene model detection using mass spectrometry.Methods Mol Biol. 2010;604:137-44. doi: 10.1007/978-1-60761-444-9_10. Methods Mol Biol. 2010. PMID: 20013369
-
Proteogenomics: needs and roles to be filled by proteomics in genome annotation.Brief Funct Genomic Proteomic. 2008 Jan;7(1):50-62. doi: 10.1093/bfgp/eln010. Epub 2008 Mar 10. Brief Funct Genomic Proteomic. 2008. PMID: 18334489 Review.
-
Genome annotation of Anopheles gambiae using mass spectrometry-derived data.BMC Genomics. 2005 Sep 19;6:128. doi: 10.1186/1471-2164-6-128. BMC Genomics. 2005. PMID: 16171517 Free PMC article.
-
A perfect genome annotation is within reach with the proteomics and genomics alliance.Curr Opin Microbiol. 2009 Jun;12(3):292-300. doi: 10.1016/j.mib.2009.03.005. Epub 2009 May 4. Curr Opin Microbiol. 2009. PMID: 19410500 Review.
Cited by
-
GAPP: A Proteogenomic Software for Genome Annotation and Global Profiling of Post-translational Modifications in Prokaryotes.Mol Cell Proteomics. 2016 Nov;15(11):3529-3539. doi: 10.1074/mcp.M116.060046. Epub 2016 Sep 14. Mol Cell Proteomics. 2016. PMID: 27630248 Free PMC article.
-
Whole human genome proteogenomic mapping for ENCODE cell line data: identifying protein-coding regions.BMC Genomics. 2013 Feb 28;14:141. doi: 10.1186/1471-2164-14-141. BMC Genomics. 2013. PMID: 23448259 Free PMC article.
-
Lipoprotein multigene families in Mycoplasma pneumoniae.J Bacteriol. 2006 Aug;188(15):5393-9. doi: 10.1128/JB.01819-05. J Bacteriol. 2006. PMID: 16855228 Free PMC article.
-
Peptimapper: proteogenomics workflow for the expert annotation of eukaryotic genomes.BMC Genomics. 2019 Jan 17;20(1):56. doi: 10.1186/s12864-019-5431-9. BMC Genomics. 2019. PMID: 30654742 Free PMC article.
-
The proteogenomic mapping tool.BMC Bioinformatics. 2011 Apr 22;12:115. doi: 10.1186/1471-2105-12-115. BMC Bioinformatics. 2011. PMID: 21513508 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous