Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Jan 20;27(2):108972.
doi: 10.1016/j.isci.2024.108972. eCollection 2024 Feb 16.

No country for old methods: New tools for studying microproteins

Affiliations
Review

No country for old methods: New tools for studying microproteins

Fabiola Valdivia-Francia et al. iScience. .

Erratum in

Abstract

Microproteins encoded by small open reading frames (sORFs) have emerged as a fascinating frontier in genomics. Traditionally overlooked due to their small size, recent technological advancements such as ribosome profiling, mass spectrometry-based strategies and advanced computational approaches have led to the annotation of more than 7000 sORFs in the human genome. Despite the vast progress, only a tiny portion of these microproteins have been characterized and an important challenge in the field lies in identifying functionally relevant microproteins and understanding their role in different cellular contexts. In this review, we explore the recent advancements in sORF research, focusing on the new methodologies and computational approaches that have facilitated their identification and functional characterization. Leveraging these new tools hold great promise for dissecting the diverse cellular roles of microproteins and will ultimately pave the way for understanding their role in the pathogenesis of diseases and identifying new therapeutic targets.

Keywords: Biological sciences; Biotechnology; Genetics; Molecular biology.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interests.

Figures

None
Graphical abstract
Figure 1
Figure 1
The classification of sORFs Schematic representation of different open reading frames (ORFs) and their genomic location. A large fraction of the mammalian genome is composed of small open reading frames (sORFs) in the untranslated regions (red). Canonical ORFs, conventionally more than 100 amino acids long, are depicted at the top, with exons delimited with a known start and stop codon flanked by 5′ untranslated region (UTR) and 3′UTR. The mammalian genome encodes transcribed and potentially functional sORFs between 10 and 100 amino acids, which can be classified according to their genomic location. Upstream open reading frames (uORFs) are found in the 5′UTR of conventional ORFs, while downstream ORFs (dORFs) are found in the 3′UTR of conventional ORFs. In some cases, alternative ORFs arise from alternative initiation start sites within canonical ORFs and lead to shorter isoforms of a known ORF. sORFs can also be found in intronic regions of canonical ORFs and in intergenic regions between two canonical ORFs, known as intronic and intergenic sORFs, respectively. Finally, an important source of sORFs are long non-coding ORFs.
Figure 2
Figure 2
Identification of sORFs Schematic workflow of the different methods used for the identification of small ORFs. Samples from diverse sources, human biopsies, mouse cells and cultured cells can be processed using ribosome profiling (Ribo-Seq), mass spectrometry (MS) and/or computational approaches. Ribo-Seq captures snapshots of ribosome-protected fragments that are purified and sequenced. Small ORFs showing 3-nucleotide periodicity are most likely to be translated into microproteins. Microproteins can be extracted, digested, fractionated and enriched by size selection followed by proteomics. Data are searched against custom databases containing the potential sORFs. Computational approaches to determine sORFs rely on predictions based on the conservation between species, codon bias and coding potential and transcriptomic and proteomic data analysis. The different algorithms can predict the presence of sORF based on detecting similarity to known proteins or domains, nucleotide composition, codon substitution or machine learning approaches.
Figure 3
Figure 3
Targeting sORFs using CRISPR Schematic representation of the CRISPR screening workflow. Top panel: For pooled CRISPR screens, the sgRNA library is transduced into Cas9-expressing cells in vitro. Cells are harvested at the end of the experiment (e.g., following a certain number of passages or treatment) and submitted to sequencing. The enrichment and depletion of the sgRNAs is then used to infer gene function. Middle panel: Arrayed CRISPR screens are carried out in different wells, where one sgRNA is targeted per well. In an arrayed screen, the phenotype can be linked directly to the sgRNA to determine gene function. Lower panel: single-cell CRISPR screens in vitro and in vivo. Similar to pooled CRISPR screens, cells are transduced with a pooled library. Single cells are then subjected to single-cell RNA-seq to obtain the transcriptomic readout coupled to cell-type specific sgRNA representation. In an in vivo single-cell CRISPR screen, the sgRNA library is delivered, for example, directly into mouse embryos or adult mice. At a later time point, the organ of interest is collected, and cells are isolated for single-cell RNA-seq, which can determine proliferative changes and the transcriptomic consequences of the sgRNA in different cell types.

Similar articles

Cited by

References

    1. Ruiz-Orera J., Albà M.M. Translation of Small Open Reading Frames: Roles in Regulation and Evolutionary Innovation. Trends Genet. 2019;35:186–198. doi: 10.1016/j.tig.2018.12.003. - DOI - PubMed
    1. Chen J., Brunner A.-D., Cogan J.Z., Nuñez J.K., Fields A.P., Adamson B., Itzhak D.N., Li J.Y., Mann M., Leonetti M.D., et al. Pervasive functional translation of noncanonical human open reading frames. Science. 2020;367:140–146. doi: 10.1126/science.aav5912. - DOI - PMC - PubMed
    1. McGillivray P., Ault R., Pawashe M., Kitchen R., Balasubramanian S., Gerstein M. A comprehensive catalog of predicted functional upstream open reading frames in humans. Nucleic Acids Res. 2018;46:3326–3338. doi: 10.1093/nar/gky188. - DOI - PMC - PubMed
    1. Bazzini A.A., Johnstone T.G., Christiano R., MacKowiak S.D., Obermayer B., Fleming E.S., Vejnar C.E., Lee M.T., Rajewsky N., Walther T.C., et al. Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation. EMBO J. 2014;33:981–993. doi: 10.1002/embj.201488411. - DOI - PMC - PubMed
    1. Ingolia N.T., Lareau L.F., Weissman J.S. Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell. 2011;147:789–802. doi: 10.1016/j.cell.2011.10.002. - DOI - PMC - PubMed

LinkOut - more resources