Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2025 Feb;41(2):107-118.
doi: 10.1016/j.tig.2024.12.001. Epub 2025 Jan 2.

Finding functional microproteins

Affiliations
Review

Finding functional microproteins

Sikandar Azam et al. Trends Genet. 2025 Feb.

Abstract

Genome-wide translational profiling has uncovered the synthesis in human cells of thousands of microproteins, a class of proteins traditionally overlooked in functional studies. Although an increasing number of these microproteins have been found to play critical roles in cellular processes, the functional relevance of the majority remains poorly understood. Studying these low-abundance, often unstable proteins is further complicated by the challenge of disentangling their functions from the noncoding roles of the associated DNA, RNA, and the act of translation. This review highlights recent advances in functional genomics that have led to the discovery of >1000 human microproteins required for optimal cell proliferation. Ongoing technological innovations will continue to clarify the roles and mechanisms of microproteins in both normal physiology and disease, potentially opening new avenues for therapeutic exploration.

Keywords: CRISPR screen; functional microprotein; lncRNA; noncanonical ORF; short ORF.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests X.W. is a member of the Scientific Advisory Board for Epitor Therapeutics.

Figures

Figure 1:
Figure 1:. Size distribution of canonical human proteins.
Shown is a histogram plot for the length (aa) of 19,352 human proteins included in the NCBI/EMBL-EBI MANE Select set (v1.3). Note that the x-axis is in log scale.
Figure 2:
Figure 2:. Most microproteins are encoded by noncanonical ORFs (ncORFs).
(A) ncORFs in canonical mRNAs: uORF – upstream ORF contained entirely in the 5’ UTR; uoORF/ouORF – upstream overlapping ORF that begins in the 5’ UTR but ends within the canonical ORF in a different reading frame; intORF – ORF within the canonical ORF in a different reading frame; doORF – downstream overlapping ORF that begins in the canonical ORF but ends in the 3’ UTR in a different reading frame; dORF – downstream ORF contained entirely in the 3’ UTR. (B) Many lncRNAs contain a short ORF (sORF or smORF). (C) Some circular RNAs (circRNAs) contain an ORF (circORF/cORF) that can be translated in a cap-independent manner. The numbers on the right indicate the number of ORFs supported by Ribo-seq data in a previous study [21], except for circRNAs [9].
Figure 3:
Figure 3:. Potential coding and noncoding functions of short ORFs.
(A) lncRNA: At the DNA level, lncRNA loci can function as enhancers, regulating the transcription of neighboring genes. CRISPR-mediated targeting of the short ORFs within lncRNAs could disrupt this enhancer activity. At the RNA level, most lncRNAs exert their functions without being translated. However, translation of short ORFs within lncRNAs may activate nonsense-mediated decay (NMD), leading to the degradation of the lncRNA. Consequently, disrupting the translation of these short ORFs could alter the lncRNA’s stability and affect its RNA-mediated noncoding functions. Additionally, the microproteins produced from these lncRNA-encoded ORFs may also have functional roles as proteins. (B) uORF: At the DNA level, uORFs, due to their proximity to the promoter, may influence the transcription of the host mRNA gene when disrupted by CRISPR. At the RNA level, uORF sequences can regulate the stability and localization of the host mRNA. Translation of uORFs frequently inhibits the translation of the main ORF and can also trigger NMD, leading to the degradation of the host mRNA.
Figure 4:
Figure 4:. Pairwise overlap of microprotein hits across published CRISPR screens.
The heatmap displays the pairwise overlap of microprotein hits between studies, with each row and column representing a specific study. The heatmap values indicate the number of microprotein hits in one study (row) that overlap with hits in another (column). Overlap between two different ORFs is defined as sharing at least one base pair in the genome. Note that Schlesinger et al. 2024 [16] is excluded due to the absence of genomic coordinate data. Study IDs correspond to those listed in Table 1.

References

    1. Harrison PM et al. (2002) A question of size: the eukaryotic proteome and the problems in defining it. Nucleic Acids Res 30, 1083–1090. 10.1093/nar/30.5.1083 - DOI - PMC - PubMed
    1. Djebali S et al. (2012) Landscape of transcription in human cells. Nature 489, 101–108. 10.1038/nature11233 - DOI - PMC - PubMed
    1. Ingolia NT et al. (2011) Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell 147, 789–802. 10.1016/j.cell.2011.10.002 - DOI - PMC - PubMed
    1. Ingolia Nicholas T. et al. (2014) Ribosome Profiling Reveals Pervasive Translation Outside of Annotated Protein-Coding Genes. Cell Reports 8, 1365--1379. 10.1016/j.celrep.2014.07.045 - DOI - PMC - PubMed
    1. Ji Z et al. (2015) Many lncRNAs, 5’UTRs, and pseudogenes are translated and some are likely to express functional proteins. Elife 4, e08890. 10.7554/eLife.08890 - DOI - PMC - PubMed

LinkOut - more resources