Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2017 Sep;27(9):685-696.
doi: 10.1016/j.tcb.2017.04.006. Epub 2017 May 18.

Mining for Micropeptides

Affiliations
Review

Mining for Micropeptides

Catherine A Makarewich et al. Trends Cell Biol. 2017 Sep.

Abstract

Advances in computational biology and large-scale transcriptome analyses have revealed that a much larger portion of the genome is transcribed than was previously recognized, resulting in the production of a diverse population of RNA molecules with both protein-coding and noncoding potential. Emerging evidence indicates that several RNA molecules have been mis-annotated as noncoding and in fact harbor short open reading frames (sORFs) that encode functional peptides and that have evaded detection until now due to their small size. sORF-encoded peptides (SEPs), or micropeptides, have been shown to have important roles in fundamental biological processes and in the maintenance of cellular homeostasis. These small proteins can act independently, for example as ligands or signaling molecules, or they can exert their biological functions by engaging with and modulating larger regulatory proteins. Given their small size, micropeptides may be uniquely suited to fine-tune complex biological systems.

Keywords: bioactive peptide; micropeptide; ncRNA; short open reading frame.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Tools and Methods for the Identification of Micropeptides
(A) Regardless of their coding potential, all mRNA transcripts contain multiple different open reading frames (ORFs) of varying lengths (grey arrows). Typically, the longest ORF within the transcript codes for the functional protein product (red arrow), and this is most readily seen in large protein coding genes (middle). However, in the case of very small proteins like micropeptides (right), finding the correct ORF is extremely challenging because the longest one is frequently not the actual coding region, and the coding ORF gets lost in the noise of other spurious non-coding ORFs. (B) Computational tools, such as PhyloCSF, have been developed to help identify potential coding genes based on the evolutionary conservation of their nucleotide sequence. The mouse Upperhand (Uph)-Hand2 locus (Left) is a perfect example of a region of the genome that contains a conserved protein coding gene (Hand2) and a non-coding transcript (Uph). As depicted, Hand2 scores positively on PhyloCSF (red color, upward deflection) specifically in the region that codes for the functional Hand2 protein (exon 1 and 2, E1 and E2). Conversely, Uph scores negatively throughout its sequence as illustrated by the negative (blue) score. PhyloCSF has been used to identify several novel micropeptides including dwarf open reading frame (DWORF, right), whose strong sequence conservation can be seen prominently in exon 2 (E2). (C) Experimental methods such as ribosome profiling have also been developed that aid in the identification of novel protein coding genes. In this technique, active translation is halted by the addition of translation inhibitors and samples are treated with nucleases to generate ribosome protected fragments (RPFs), or footprints, that are protected from digestion by the presence of the ribosome. These footprints are then recovered, sequenced and mapped to the genome to reveal their origin.
Figure 2
Figure 2. Micropeptide Processing and their Biological Functions
(A) Unlike classical examples of neuropeptides and peptide hormones that are synthesized as much larger proteins and later proteolytically processed to generate their mature active peptide product (Left), micropeptides are translated directly from their precursor mRNAs as functional molecules (Right). Micropeptides have been shown to work as key regulators of many fundamental biological processes and can act independently or exert their effects by engaging with and modulating much larger regulatory proteins (B).
Figure 3
Figure 3. Methods for Verifying Micropeptide Coding Potential
(A) CRISPR/Cas9-mediated gene editing can be used to knockin an epitope tag into the endogenous locus of a putative micropeptide in-frame with the predicted sORF to test for coding potential. The Cas9 endonuclease (yellow) is targeted to a specific location on the genome via a single guide RNA (sgRNA, green) which is immediately adjacent to a protospacer adjacent motif (PAM) site. Upon recognition of the appropriate site, Cas9 will then unwind the DNA duplex and create a DNA double strand break. This double strand break can either be repaired by non-homologous end joining (NHEJ) or by homology-directed repair (HDR). To utilize HDR for editing, a donor template with homology to the targeted locus must be provided and this must contain the sequence of the epitope tag you wish to knockin (shown here as FLAG, red). Expression of your epitope tag can then be verified by Western Blot or immunostaining. (B) The coding potential of a sORF can also be assessed by in vitro translation. The full-length cDNA of your peptide of interest must be cloned into a plasmid containing a phage polymerase promoter (shown here as T7, Sp6 or T3) and cell-free protein synthesis is performed in the presence of 35S-methionine, which will radioactively label your micropeptide (35S-methionine is depicted as red circles in the polypeptide chain). These protein products are then subjected to gel electrophoresis and autoradiography and then analyzed to determine if a product of the predicted molecular weight is produced. As a control, a frame-shift mutant of your coding sequence should be cloned and this should not yield an 35S-labeled protein product.

References

    1. Bazzini AA, et al. Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation. EMBO J. 2014;33(9):981–93. - PMC - PubMed
    1. Cabili MN, et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 2011;25(18):1915–27. - PMC - PubMed
    1. Chew GL, et al. Ribosome profiling reveals resemblance between long non-coding RNAs and 5′ leaders of coding RNAs. Development. 2013;140(13):2828–34. - PMC - PubMed
    1. Derrien T, et al. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 2012;22(9):1775–89. - PMC - PubMed
    1. Guttman M, et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature. 2009;458(7235):223–7. - PMC - PubMed