Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec 18;13(24):2090.
doi: 10.3390/cells13242090.

Pervasiveness of Microprotein Function Amongst Drosophila Small Open Reading Frames (SMORFS)

Affiliations

Pervasiveness of Microprotein Function Amongst Drosophila Small Open Reading Frames (SMORFS)

Ana Isabel Platero et al. Cells. .

Abstract

Small Open Reading Frames (smORFs) of less than 100 codons remain mostly uncharacterised. About a thousand smORFs per genome encode peptides and microproteins about 70-80 aa long, often containing recognisable protein structures and markers of translation, and these are referred to as short Coding Sequences (sCDSs). The characterisation of individual sCDSs has provided examples of smORFs' function and conservation, but we cannot infer the functionality of all other metazoan smORFs from these. sCDS function has been characterised at a genome-wide scale in yeast and bacteria, showing that hundreds can produce a phenotype, but attempts in metazoans have been less successful. Either most sCDSs are not functional, or classic experimental techniques do not work with smORFs due to their shortness. Here, we combine extensive proteomics with bioinformatics and genetics in order to detect and corroborate sCDS function in Drosophila. Our studies nearly double the number of sCDSs with detected peptides and microproteins and an experimentally corroborated function. Finally, we observe a correlation between proven sCDS protein function and bioinformatic markers such as conservation and GC content. Our results support that sCDSs peptides and microproteins act as membrane-related regulators of canonical proteins, regulators whose functions are best understood at the cellular level, and whose mutants produce little, if any, overt morphological phenotypes.

Keywords: Drosophila melanogaster; autophagy regulation; embryogenesis; functional genomics; microproteins; proteomics; ribosome profiling; sCDS (short coding sequences); smORFs (small open reading frames).

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Types of smORFs. Types of ORFs mentioned in this work, and their relevant characteristics.
Figure 2
Figure 2
Proteomic detection of sCDS peptides. (A) Number of sCDS peptides and microproteins detected in this study (Couso lab) compared with previous data from Brunner and Casas-Vila [35,49]. (B) sCDSs detected by either PunchP or gel fractionation compared with those detected as translated by Ribo-seq [36]. (C) Proteomic hits amongst detected sCDSs, showing the number of times sCDSs were detected either in different experiments or by different peptides. (D) Different Ribo-seq RPKM levels [36] and length (aa) between sCDSs detected by proteomics (Couso lab) and those that remained undetected.
Figure 3
Figure 3
GC content of sCDSs and other ORFs. (A) Size distribution of the GC ratio for different ORF classes. Averages are shown on top and the average canonical value (0.54) is shown for the reference in each class. Only 5% of lncORFs score higher than this value, whereas almost half of sCDSs do. (B) The GC ratio (see methods) increases in apparent correlation with ORF type and length until values similar to those of canonical ORFs, and superior to those of pseudogenes, are reached by sCDSs. Large circles joined by a line represent the averages for each smORF class, pseudogenes, and canonical protein coding genes. See Supplementary File S2 for the data.
Figure 4
Figure 4
Patterns of sCDS expression. (AL) Patterns of sCDS expression revealed by in situ hybridisation in Drosophila embryos, anterior to the left and dorsal-up; embryonic stage (st) is indicated on left bottom corner of each panel. Most patterns appear to first be located in the developing mesodermal (m) and endodermal (e) tissues, and then to their derivatives such as the gut (gt) and salivary glands (s). CG17278 and CG17343 are expressed in ectodermal tissues such as sensory organ precursors, and CNS, respectively. Most other patterns were either ubiquitous or faint (see Supplementary File S3 for further details).
Figure 5
Figure 5
Functional characterisation of CG34250. (A) Flybase Drosophila genome browser displaying the CG34250 locus and R12.2- and Df(3)f1614-qjt-generated deletions (blue and purple lines, respectively). (B) Aminoacid sequence alignment of CG34250 smORF family members, showing sequence similarities (blue); the MS-detected peptide is highlighted with a black bar. (C) Neighbour-Joining phylogenetic tree using Constraint-Based Multiple Alignment (COBALT) showing the evolutionary distances of CG34250 homologues. (D,E) Panels show the secondary structure predictions of signal peptide and transmembrane domains in CG34250 peptides obtained via the Phobius program from Drosophila melanogaster (D) and Ceratitis capitata (E), showing a conserved single transmembrane topology (bars). (F) 3D predicted structure of Drosophila CG34250 peptide using the Alphafold program, displaying a helical structure (GG’’’). The transfection and expression of the CG34250 peptide tagged with Venus reveal initial reticular expression in the cytoplasm (G’, green), possibly in the ER. Lysotracker stains the lysosomes (G’’, red) and DAPI stains the nuclear DNA (G’’’, blue) (HH’’’). The CG34250 peptide tagged with Venus (H’) accumulates at lysosomes (H’’, red); DAPI stains the nuclei (H’’’). (J) Flybase genome browser showing CG34250 RNA expression levels at all stages studied (red arrow), including adult mature females (black arrow). (I) RT-PCR amplification of CG34250 transcript fragment (195 bp) from messenger RNA extracted from unfertilized eggs.
Figure 6
Figure 6
Functional characterization of CG12384/Drosophila DAP. (A) Aminoacid sequence alignment of death-associated protein 1 family members, with similarities shown in blue. (B) Neighbour-Joining phylogenetic tree using Constraint-Based Multiple Alignment (COBALT), showing that CG12384 peptides belong to a highly conserved smORF family with the members present in Deuterostomes (Chordata) and Protostomes (Arthropoda, Molusca). (C) The suggested role of DAP1 in mouse TOR pathway (Koren et al., 2010) [56]. Under normal conditions, mTORC1 is proposed to repress DAP1 function (mediated by phosphorylation). (D) Under nutrient restriction, the repression of mTORC1 allows DAP1 to repress excessive autophagy (autophagy is an initial beneficial cellular response but, if maintained, produces apoptosis). (EH) The increase in lysosomes in the Drosophila fat body cells of starved larvae is much more intense in CG12384 mutants (F) than in the wild-type (H); note that minimal increase in lysosomes occur during fed conditions in both CG12384 mutant (E) and wild-type fat body cells (G). (I) Quantification of lysotracker fluorescence as shown in (EH), showing the drastic increase in lysosomal activity in CG12384 mutants during starvation conditions (****: p < 0.0001, **: p = 0.0024, ns: non-significant, as obtained by standard two-tailed unpaired t-test). (J,K) S2 cells grown in serum-free medium show an upregulation of the autophagosome marker Ref(2)P, but this increase is prevented in cells overexpressing a CG12384-Flag peptide. Note also that Flag and Ref(2)P seem not to overlap. The antagonism between Ref(2)P and CG12384-Flag is compatible with the proposed role for DAP1 in the upstream regulation of the autophagy response downstream of TOR.

References

    1. Couso J.P., Patraquim P. Classification and function of small open reading frames. Nat. Rev. Mol. Cell Biol. 2017;18:575–589. doi: 10.1038/nrm.2017.58. - DOI - PubMed
    1. Pueyo J.I., Magny E.G., Couso J.P. New peptides under the s(ORF)ace of the genome. Trends Biochem. Sci. 2016;41:665–678. doi: 10.1016/j.tibs.2016.05.003. - DOI - PubMed
    1. Aspden J.L., Eyre-Walker Y.C., Philips R.J., Brocard M., Amin U., Couso J.P. Extensive translation of small Open Reading Frames revealed by Poly-Ribo-Seq. eLife. 2014;3:e03528. doi: 10.7554/eLife.03528. - DOI - PMC - PubMed
    1. Magny E.G., Platero A.I., Bishop S.A., Pueyo J.I., Aguilar-Hidalgo D., Couso J.P. Pegasus, a small extracellular peptide enhancing short-range diffusion of Wingless. Nat. Commun. 2021;12:5660. doi: 10.1038/s41467-021-25785-z. - DOI - PMC - PubMed
    1. Seo P.J., Hong S.Y., Kim S.G., Park C.M. Competitive inhibition of transcription factors by small interfering peptides. Trends Plant Sci. 2011;16:541–549. doi: 10.1016/j.tplants.2011.06.001. - DOI - PubMed

Publication types

LinkOut - more resources