Upstream open reading frames may contain hundreds of novel human exons
- PMID: 39565752
- PMCID: PMC11578521
- DOI: 10.1371/journal.pcbi.1012543
Upstream open reading frames may contain hundreds of novel human exons
Abstract
Several recent studies have presented evidence that the human gene catalogue should be expanded to include thousands of short open reading frames (ORFs) appearing upstream or downstream of existing protein-coding genes, each of which might create an additional bicistronic transcript in humans. Here we explore an alternative hypothesis that would explain the translational and evolutionary evidence for these upstream ORFs without the need to create novel genes or bicistronic transcripts. We examined 2,199 upstream ORFs that have been proposed as high-quality candidates for novel genes, to determine if they could instead represent protein-coding exons that can be added to existing genes. We checked for the conservation of these ORFs in four recently sequenced, high-quality human genomes, and found a large majority (87.8%) to be conserved in all four as expected. We then looked for splicing evidence that would connect each upstream ORF to the downstream protein-coding gene at the same locus, thus creating a novel splicing variant using the upstream ORF as its first exon. These protein coding exon candidates were further evaluated using protein structure predictions of the protein sequences that included the proposed new exons. We determined that 541 out of 2,199 upstream ORFs have strong evidence that they can form protein coding exons that are part of an existing gene, and that the resulting protein is predicted to have similar or better structural quality than the currently annotated isoform.
Copyright: © 2024 Ji, Salzberg. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Conflict of interest statement
The authors have declared that no competing interest exists.
Figures





Update of
-
Upstream open reading frames may contain hundreds of novel human exons.bioRxiv [Preprint]. 2024 Apr 1:2024.03.22.586333. doi: 10.1101/2024.03.22.586333. bioRxiv. 2024. Update in: PLoS Comput Biol. 2024 Nov 20;20(11):e1012543. doi: 10.1371/journal.pcbi.1012543. PMID: 38562894 Free PMC article. Updated. Preprint.
Similar articles
-
Upstream open reading frames may contain hundreds of novel human exons.bioRxiv [Preprint]. 2024 Apr 1:2024.03.22.586333. doi: 10.1101/2024.03.22.586333. bioRxiv. 2024. Update in: PLoS Comput Biol. 2024 Nov 20;20(11):e1012543. doi: 10.1371/journal.pcbi.1012543. PMID: 38562894 Free PMC article. Updated. Preprint.
-
["Matreshka" Genes with Alternative Reading Frames].Genetika. 2016 Feb;52(2):146-63. Genetika. 2016. PMID: 27215029 Review. Russian.
-
The Protein-Coding Human Genome: Annotating High-Hanging Fruits.Bioessays. 2019 Nov;41(11):e1900066. doi: 10.1002/bies.201900066. Epub 2019 Sep 23. Bioessays. 2019. PMID: 31544971 Review.
-
Translation of the downstream ORF from bicistronic mRNAs by human cells: Impact of codon usage and splicing in the upstream ORF.Protein Sci. 2025 Feb;34(2):e70036. doi: 10.1002/pro.70036. Protein Sci. 2025. PMID: 39840808 Free PMC article.
-
Characterizing the splice map of Turkey Hemorrhagic Enteritis Virus.Virol J. 2024 Aug 6;21(1):175. doi: 10.1186/s12985-024-02449-0. Virol J. 2024. PMID: 39107824 Free PMC article.
Cited by
-
Unlocking the secrets of the immunopeptidome: MHC molecules, ncRNA peptides, and vesicles in immune response.Front Immunol. 2025 Jan 29;16:1540431. doi: 10.3389/fimmu.2025.1540431. eCollection 2025. Front Immunol. 2025. PMID: 39944685 Free PMC article. Review.
References
-
- Varabyou A, Sommer MJ, Erdogdu B, Shinder I, Minkin I, Chao K- H, et al.. CHESS 3: an improved, comprehensive catalog of human genes and transcripts based on large-scale expression data, phylogenetic analysis, and protein structure. Genome Biology. 2023;24(1):249. doi: 10.1186/s13059-023-03088-4 - DOI - PMC - PubMed
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources