Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Nov 4;11(11):e1005641.
doi: 10.1371/journal.pgen.1005641. eCollection 2015 Nov.

Leaderless Transcripts and Small Proteins Are Common Features of the Mycobacterial Translational Landscape

Affiliations

Leaderless Transcripts and Small Proteins Are Common Features of the Mycobacterial Translational Landscape

Scarlet S Shell et al. PLoS Genet. .

Abstract

RNA-seq technologies have provided significant insight into the transcription networks of mycobacteria. However, such studies provide no definitive information on the translational landscape. Here, we use a combination of high-throughput transcriptome and proteome-profiling approaches to more rigorously understand protein expression in two mycobacterial species. RNA-seq and ribosome profiling in Mycobacterium smegmatis, and transcription start site (TSS) mapping and N-terminal peptide mass spectrometry in Mycobacterium tuberculosis, provide complementary, empirical datasets to examine the congruence of transcription and translation in the Mycobacterium genus. We find that nearly one-quarter of mycobacterial transcripts are leaderless, lacking a 5' untranslated region (UTR) and Shine-Dalgarno ribosome-binding site. Our data indicate that leaderless translation is a major feature of mycobacterial genomes and is comparably robust to leadered initiation. Using translational reporters to systematically probe the cis-sequence requirements of leaderless translation initiation in mycobacteria, we find that an ATG or GTG at the mRNA 5' end is both necessary and sufficient. This criterion, together with our ribosome occupancy data, suggests that mycobacteria encode hundreds of small, unannotated proteins at the 5' ends of transcripts. The conservation of small proteins in both mycobacterial species tested suggests that some play important roles in mycobacterial physiology. Our translational-reporter system further indicates that mycobacterial leadered translation initiation requires a Shine Dalgarno site in the 5' UTR and that ATG, GTG, TTG, and ATT codons can robustly initiate translation. Our combined approaches provide the first comprehensive view of mycobacterial gene structures and their non-canonical mechanisms of protein expression.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Leaderless and leadered genes produce distinct RNA-seq and ribosome profiling 5’ boundaries.
(A) The transcription start site (TSS) and translation initiation site are the same in leaderless genes. No 5’ UTR and no Shine-Dalgarno (SD) sequence imply that an assembled 70S ribosome engages the 5’ terminal initiation codon directly, followed by elongation to translate the ORF. Individual sequence reads from RNA-seq (green) and ribosome profiling (Ribo-seq, orange) analyses were mapped to the genome, and the abundance of the individual reads is indicated by the height of the peaks. In leaderless translation RNA-seq and ribosome profiling have coincident 5’ boundaries. The 5’ triplet is nearly always ATG or GTG and, in the typical example shown, corresponds to the predicted N-terminus of the annotated ORF. (B) Traditional gene structures generate nested ribosome profiling profiles, with a 5’ UTR that includes an SD ribosome-binding site upstream of the initiating methionine codon (ATG). The 30S and 50S ribosomal subunits assemble at the SD to form a complete 70S ribosome that begins translation at the adjacent AUG with an N-terminal formylated methionine (fM) amino acid residue. RNA-seq reads (green) indicate positive-strand transcription at this locus, and upstream of an annotated ORF. Mapped ribosome profiling reads (orange) begin downstream of the onset of RNA-seq reads, and ~17–35 nt upstream of the initiation codon of the annotated ORF. In both examples, JCVI correctly predicted the respective ORFs (black).
Fig 2
Fig 2. Leaderless gene architectures bring promoters and ORFs together.
(A) Logo plot of TSS and proximal promoter region of traditional leadered genes. A purine (A or G) is favored at the +1 nucleotide, and an AT rich -10 element appears upstream. The 5’ UTR downstream of the transcription start site shows no sequence constraints or enrichment. (B) A Logo plot of the 5’ UTR from the translation initiation codon shows a Shine-Dalgarno-like AGGAGG sequence enrichment, centered 9–10 nt upstream (positions 10–11). From the initiation codon, the coding sequence downstream shows the wobble bias of the G-C rich mycobacterial genome. (C) The proximal promoter regions of leaderless genes have a -10 sequence of similar composition and spacing to that of leadered genes (compare to 2A). The TSS is also the first nucleotide of the translation initiation codon. There is no evidence of Shine-Dalgarno sequence enrichment upstream. The ORF initiated by leaderless codons shows the same wobble bias as seen in leadered ORFs.
Fig 3
Fig 3. Directed β-galactosidase reporters show that leaderless ATG and GTG triplets initiate robust translation of lacZ in M. smegmatis (A) but not in E. coli (B).
Candidate codons were tested in leadered and leaderless contexts for their activity in initiating translation of the adjacent lacZ gene. Putative positive (ATG) and negative (ATC) controls provided reference activities. The level of β-galactosidase expression in M. smegmatis from leaderless transcripts (upper right green bars beginning with an ATG or GTG codon) was similar to that from leadered transcripts (upper left black bars). CTG and TTG candidate codons were ineffective initiators at the leaderless position. Differences in β -galactosidase activities in M. smegmatis leaderless constructs were not due to effects of the +1 nucleotide (indicated by black background) on transcript abundance, as corresponding leadered constructs were not comparably affected.
Fig 4
Fig 4. A translational reporter system identified leaderless and leadered initiation codon preferences.
(A) Libraries of leader sequences were generated using two overlapping oligonucleotides, each with a single randomized codon positioned either at the leaderless position (+1) or the leadered position (+30), in-frame with the zeocin-resistance (zeo r) gene. Self-primed heterodimers were inserted between the promoter and the zeo r gene and transformed into E. coli. The library was electroporated into M. smegmatis. Hygromycin selection allowed maintenance of the complete library, while zeocin selection required translation initiation at either one of the randomized codon sites. Following selection in zeocin, plasmids were recovered and the leader regions amplified for Ion-Torrent sequencing. Deep sequencing of amplicon libraries allowed the unbiased identification and estimation of relative efficiency of initiation codons. (B) A Shine-Dalgarno site was omitted to facilitate direct comparison between leaderless and leadered architectures. Read counts were compiled for each of the 64 possible codons at the leaderless position (columns) and leadered position (rows). Heat map indicates read counts of each combinatorial leaderless/leadered codon pair, from 100 (blue) through 104 (yellow). Only ATG or GTG at the leaderless position were capable of initiating translation of zeo r. At the leadered codon position, no enrichment indicated that translation initiation did not occur at any of the possible codons. A further reduction of the expected stop codons suggested that they prevented read through of leaderless ribosomes into the zeo r ORF. (C) A Shine-Dalgarno sequence enables efficient use of diverse leadered initiation codons. A consensus Shine-Dalgarno (SD) element was placed upstream of the randomized leadered codon position. Zeocin-resistant pools showed a complex pattern of active translation initiation codons at both the leaderless and leadered positions. The presence of a Shine-Dalgarno supported translation initiation activity of ATG and GTG triplets in the leadered position, as well as the less common TTG and ATT triplets.
Fig 5
Fig 5. Definition of cis elements that support translation initiation in mycobacteria.
(A) Zeo-seq viability reporter libraries were generated to determine the sequence context preferences for a SD upstream of a leadered initiation codon. Randomized nucleotides were positioned upstream of a leadered initiation codon, and zeocin selection enriched for Shine-Dalgarno-like sequences, indicating that mycobacteria adhere to this canonical translation criterion. (B) Leaderless translation initiation exhibits no sequence preference in the adjacent mRNA. A block of 6 nt was randomized immediately downstream of a leaderless initiation zeocin reporter construct. Sequences in the recovered pools of zeocin-resistant M. smegmatis were not enriched in composition or motifs in this region. The absence of any detectable enrichment in the randomized region for the leaderless pool indicates that there are no nucleotide preferences for efficient leaderless initiation in mycobacteria downstream of the RTG codon.
Fig 6
Fig 6. Small protein ORFs are frequently coupled to the ORF downstream.
(A) M. tuberculosis leaderless transcripts initiate unannotated small protein ORFs that terminate at the start of the annotated gene downstream more often than expected. All small protein ORF stop codons within 100 nucleotides of an annotated gene start are shown relative to that start codon (0 = coupled RTGA overlap). Three structural classes are identified: uORFs (the small ORF terminates upstream of the annotated start), coupled ORFs (linked by an RTGA tetramer), and overlapping ORFs. The y-axis shows the fraction of small ORFs that terminate a specified distance (x-axis) from the annotated start codon of the downstream gene. (B) One example of a coupled small protein in M. tuberculosis and M. smegmatis, upstream of orthologous genes. The primary sequence of the encoded small protein is not conserved, but the leaderless initiation and coupled linkage is maintained.
Fig 7
Fig 7. Examples of conserved small proteins encoded by leaderless mRNAs in mycobacteria.
Small ORFs were identified upstream of annotated orthologous genes (A) cysA2/Msmeg_5788, (B) Rv0485/Msmeg_0932, and (C) nirA/Msmeg_4527, in the M. tuberculosis and M. smegmatis genomes. Schematic representation of loci in M. tuberculosis (above) or M. smegmatis (below) that encode small proteins (yellow) upstream of genes conserved between these species. The deduced amino acid sequence of each small protein is shown, with the conserved amino acids in gray shaded boxes. The genes downstream in black block arrows are putative members of the same mRNA, and the gene designated by the gray arrow upstream is transcriptionally independent, but shown for context. The amino acid identity with the protein encoded by the corresponding M. tuberculosis gene is indicated below the respective M. smegmatis gene. Below are screen shots of RNA-seq and ribosome profiling profiles in M. smegmatis, and annotated gene predictions from two different annotation algorithms JCVI (black) and PATRIC (blue). Small proteins encoded throughout genomes are poorly annotated; note here that pipeline annotation algorithms predicted none of the small proteins, and in some cases predicted longer proteins on the opposite strand for which we see no transcriptional or translational evidence.

References

    1. Shine J, Dalgarno L. The 3'-terminal sequence of Escherichia coli 16S ribosomal RNA: complementarity to nonsense triplets and ribosome binding sites. Proceedings of the National Academy of Sciences of the United States of America. 1974;71(4):1342–6. Epub 1974/04/01. - PMC - PubMed
    1. Adhin MR, van Duin J. Scanning model for translational reinitiation in eubacteria. J Mol Biol. 1990;213(4):811–8. - PubMed
    1. Janosi L, Mottagui-Tabar S, Isaksson LA, Sekine Y, Ohtsubo E, Zhang S, et al. Evidence for in vivo ribosome recycling, the fourth step in protein biosynthesis. The EMBO journal. 1998;17(4):1141–51. Epub 1998/03/28. - PMC - PubMed
    1. Osterman IA, Evfratov SA, Sergiev PV, Dontsova OA. Comparison of mRNA features affecting translation initiation and reinitiation. Nucleic Acids Res. 2013;41(1):474–86. 10.1093/nar/gks989 - DOI - PMC - PubMed
    1. Christie GE, Calendar R. Bacteriophage P2 late promoters. II. Comparison of the four late promoter sequences. Journal of molecular biology. 1985;181(3):373–82. Epub 1985/02/05. - PubMed

Publication types