Using proteomics to mine genome sequences
- PMID: 15253419
- DOI: 10.1021/pr034056e
Using proteomics to mine genome sequences
Abstract
We present a method for mining unannotated or annotated genome sequences with proteomic data to identify open reading frames. The region of a genome coding for a protein sequence is identified by using information from the analysis of proteins and peptides with MALDI-TOF mass spectrometry. The raw genome sequence or any unassembled contigs of an organism are theoretically cleaved into a number of equal sized but overlapping fragments, and these are then translated in all six frames into a series of virtual proteins. Each virtual protein is then subjected to a theoretical enzymatic digestion. Standard proteomic sample preparation methods are used to separate, array, and digest the proteins of interest to peptides. The masses of the resulting peptides are measured using mass spectrometry and compared to the theoretical peptide masses of the virtual proteins. The region of the genome responsible for coding for a particular protein can then be identified when there are a large number of hits between peptides from the protein and peptides from the virtual protein. The method makes no assumptions about the location of a protein in a particular gene sequence or the positions or types of start and stop codons. To illustrate this approach, all 773 proteins of Pseudomonas aeruginosa contained in SWISS-PROT were used to theoretically test the method and optimize parameters. Increasing the size of the virtual proteins results in an overall improvement in the ability to detect the coding region, at the cost of decreasing the sensitivity of the method for smaller proteins. Increasing the minimum number of matching peptides, lowering the mass error tolerance, or increasing the signal-to-noise ratio of the simulated mass spectrum, improves the ability to detect coding regions. The method is further demonstrated on experimental data from Mycobacterium tuberculosis and is also shown to work with eukaryotic organisms (e.g., Homo sapiens).
Similar articles
-
Proteomics reveals open reading frames in Mycobacterium tuberculosis H37Rv not predicted by genomics.Infect Immun. 2001 Sep;69(9):5905-7. doi: 10.1128/IAI.69.9.5905-5907.2001. Infect Immun. 2001. PMID: 11500470 Free PMC article.
-
Analysis of the cytosolic proteome of Halobacterium salinarum and its implication for genome annotation.Proteomics. 2005 Jan;5(1):168-79. doi: 10.1002/pmic.200400910. Proteomics. 2005. PMID: 15619297
-
Proteogenomic analysis of Mycobacterium tuberculosis by high resolution mass spectrometry.Mol Cell Proteomics. 2011 Dec;10(12):M111.011627. doi: 10.1074/mcp.M111.011445. Epub 2011 Oct 3. Mol Cell Proteomics. 2011. PMID: 21969609 Free PMC article.
-
Application of proteomics to Pseudomonas aeruginosa.Adv Biochem Eng Biotechnol. 2003;83:117-40. doi: 10.1007/3-540-36459-5_5. Adv Biochem Eng Biotechnol. 2003. PMID: 12934928 Review.
-
Proteome research: complementarity and limitations with respect to the RNA and DNA worlds.Electrophoresis. 1997 Aug;18(8):1217-42. doi: 10.1002/elps.1150180804. Electrophoresis. 1997. PMID: 9298643 Review.
Cited by
-
Whole human genome proteogenomic mapping for ENCODE cell line data: identifying protein-coding regions.BMC Genomics. 2013 Feb 28;14:141. doi: 10.1186/1471-2164-14-141. BMC Genomics. 2013. PMID: 23448259 Free PMC article.
-
Identification of a copper-binding metallothionein in pathogenic mycobacteria.Nat Chem Biol. 2008 Oct;4(10):609-16. doi: 10.1038/nchembio.109. Epub 2008 Aug 24. Nat Chem Biol. 2008. PMID: 18724363 Free PMC article.
-
Computational methods for protein identification from mass spectrometry data.PLoS Comput Biol. 2008 Feb;4(2):e12. doi: 10.1371/journal.pcbi.0040012. PLoS Comput Biol. 2008. PMID: 18463710 Free PMC article.
-
Identification of a Novel Serum Biomarker for Tuberculosis Infection in Chinese HIV Patients by iTRAQ-Based Quantitative Proteomics.Front Microbiol. 2018 Feb 26;9:330. doi: 10.3389/fmicb.2018.00330. eCollection 2018. Front Microbiol. 2018. PMID: 29535695 Free PMC article.
-
HybGFS: a hybrid method for genome-fingerprint scanning.BMC Bioinformatics. 2006 Oct 29;7:479. doi: 10.1186/1471-2105-7-479. BMC Bioinformatics. 2006. PMID: 17069662 Free PMC article.
MeSH terms
Substances
LinkOut - more resources
Full Text Sources