Distinguishing protein-coding and noncoding genes in the human genome
- PMID: 18040051
- PMCID: PMC2148306
- DOI: 10.1073/pnas.0709013104
Distinguishing protein-coding and noncoding genes in the human genome
Abstract
Although the Human Genome Project was completed 4 years ago, the catalog of human protein-coding genes remains a matter of controversy. Current catalogs list a total of approximately 24,500 putative protein-coding genes. It is broadly suspected that a large fraction of these entries are functionally meaningless ORFs present by chance in RNA transcripts, because they show no evidence of evolutionary conservation with mouse or dog. However, there is currently no scientific justification for excluding ORFs simply because they fail to show evolutionary conservation: the alternative hypothesis is that most of these ORFs are actually valid human genes that reflect gene innovation in the primate lineage or gene loss in the other lineages. Here, we reject this hypothesis by carefully analyzing the nonconserved ORFs-specifically, their properties in other primates. We show that the vast majority of these ORFs are random occurrences. The analysis yields, as a by-product, a major revision of the current human catalogs, cutting the number of protein-coding genes to approximately 20,500. Specifically, it suggests that nonconserved ORFs should be added to the human gene catalog only if there is clear evidence of an encoded protein. It also provides a principled methodology for evaluating future proposed additions to the human gene catalog. Finally, the results indicate that there has been relatively little true innovation in mammalian protein-coding genes.
Conflict of interest statement
The authors declare no conflict of interest.
Figures



Similar articles
-
Finding protein-coding genes through human polymorphisms.PLoS One. 2013;8(1):e54210. doi: 10.1371/journal.pone.0054210. Epub 2013 Jan 22. PLoS One. 2013. PMID: 23349826 Free PMC article.
-
Stochastic Gain and Loss of Novel Transcribed Open Reading Frames in the Human Lineage.Genome Biol Evol. 2020 Nov 3;12(11):2183-2195. doi: 10.1093/gbe/evaa194. Genome Biol Evol. 2020. PMID: 33210146 Free PMC article.
-
Upstream open reading frames may contain hundreds of novel human exons.PLoS Comput Biol. 2024 Nov 20;20(11):e1012543. doi: 10.1371/journal.pcbi.1012543. eCollection 2024 Nov. PLoS Comput Biol. 2024. PMID: 39565752 Free PMC article.
-
Reconsidering proteomic diversity with functional investigation of small ORFs and alternative ORFs.Exp Cell Res. 2020 Aug 1;393(1):112057. doi: 10.1016/j.yexcr.2020.112057. Epub 2020 May 6. Exp Cell Res. 2020. PMID: 32387289 Review.
-
Genome annotation past, present, and future: how to define an ORF at each locus.Genome Res. 2005 Dec;15(12):1777-86. doi: 10.1101/gr.3866105. Genome Res. 2005. PMID: 16339376 Review.
Cited by
-
Small molecule recognition of disease-relevant RNA structures.Chem Soc Rev. 2020 Oct 5;49(19):7167-7199. doi: 10.1039/d0cs00560f. Chem Soc Rev. 2020. PMID: 32975549 Free PMC article. Review.
-
Snail/Gfi-1 (SNAG) family zinc finger proteins in transcription regulation, chromatin dynamics, cell signaling, development, and disease.Cytokine Growth Factor Rev. 2013 Apr;24(2):123-31. doi: 10.1016/j.cytogfr.2012.09.002. Epub 2012 Oct 25. Cytokine Growth Factor Rev. 2013. PMID: 23102646 Free PMC article. Review.
-
Neuron-based heredity and human evolution.Front Neurosci. 2015 Jun 17;9:209. doi: 10.3389/fnins.2015.00209. eCollection 2015. Front Neurosci. 2015. PMID: 26136649 Free PMC article. Review.
-
Human genetics and genomics a decade after the release of the draft sequence of the human genome.Hum Genomics. 2011 Oct;5(6):577-622. doi: 10.1186/1479-7364-5-6-577. Hum Genomics. 2011. PMID: 22155605 Free PMC article. Review.
-
Introduction to the special section on genomics.Child Dev. 2013 Jan-Feb;84(1):6-16. doi: 10.1111/cdev.12045. Child Dev. 2013. PMID: 23350524 Free PMC article. No abstract available.
References
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources