An integrated mass-spectrometry pipeline identifies novel protein coding-regions in the human genome
- PMID: 20126623
- PMCID: PMC2812506
- DOI: 10.1371/journal.pone.0008949
An integrated mass-spectrometry pipeline identifies novel protein coding-regions in the human genome
Abstract
Background: Most protein mass spectrometry (MS) experiments rely on searches against a database of known or predicted proteins, limiting their ability as a gene discovery tool.
Results: Using a search against an in silico translation of the entire human genome, combined with a series of annotation filters, we identified 346 putative novel peptides [False Discovery Rate (FDR)<5%] in a MS dataset derived from two human breast epithelial cell lines. A subset of these were then successfully validated by a different MS technique. Two of these correspond to novel isoforms of Heterogeneous Ribonuclear Proteins, while the rest correspond to novel loci.
Conclusions: MS technology can be used for ab initio gene discovery in human data, which, since it is based on different underlying assumptions, identifies protein-coding genes not found by other techniques. As MS technology continues to evolve, such approaches will become increasingly powerful.
Conflict of interest statement
Figures



Similar articles
-
Integration of mass spectrometry and RNA-Seq data to confirm human ab initio predicted genes and lncRNAs.Proteomics. 2014 Dec;14(23-24):2760-8. doi: 10.1002/pmic.201400174. Proteomics. 2014. PMID: 25339270
-
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].Yi Chuan Xue Bao. 2004 May;31(5):431-43. Yi Chuan Xue Bao. 2004. PMID: 15478601 Chinese.
-
Genome annotation of Anopheles gambiae using mass spectrometry-derived data.BMC Genomics. 2005 Sep 19;6:128. doi: 10.1186/1471-2164-6-128. BMC Genomics. 2005. PMID: 16171517 Free PMC article.
-
Comprehensive mass spectrometric analysis of the 20S proteasome complex.Methods Enzymol. 2005;405:187-236. doi: 10.1016/S0076-6879(05)05009-3. Methods Enzymol. 2005. PMID: 16413316 Review.
-
Phosphoproteomics by mass spectrometry and classical protein chemistry approaches.Mass Spectrom Rev. 2005 Nov-Dec;24(6):828-46. doi: 10.1002/mas.20042. Mass Spectrom Rev. 2005. PMID: 15538747 Review.
Cited by
-
Proteomics of industrial fungi: trends and insights for biotechnology.Appl Microbiol Biotechnol. 2011 Jan;89(2):225-37. doi: 10.1007/s00253-010-2900-0. Epub 2010 Oct 5. Appl Microbiol Biotechnol. 2011. PMID: 20922379 Free PMC article. Review.
-
The bacterial proteogenomic pipeline.BMC Genomics. 2014;15 Suppl 9(Suppl 9):S19. doi: 10.1186/1471-2164-15-S9-S19. Epub 2014 Dec 8. BMC Genomics. 2014. PMID: 25521444 Free PMC article.
-
A global non-coding RNA system modulates fission yeast protein levels in response to stress.Nat Commun. 2014 May 23;5:3947. doi: 10.1038/ncomms4947. Nat Commun. 2014. PMID: 24853205 Free PMC article.
-
Whole human genome proteogenomic mapping for ENCODE cell line data: identifying protein-coding regions.BMC Genomics. 2013 Feb 28;14:141. doi: 10.1186/1471-2164-14-141. BMC Genomics. 2013. PMID: 23448259 Free PMC article.
-
In silico prediction of housekeeping long intergenic non-coding RNAs reveals HKlincR1 as an essential player in lung cancer cell survival.Sci Rep. 2019 May 14;9(1):7372. doi: 10.1038/s41598-019-43758-7. Sci Rep. 2019. PMID: 31089191 Free PMC article.
References
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Molecular Biology Databases
Research Materials