Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan 16:4:uqad001.
doi: 10.1093/femsml/uqad001. eCollection 2023.

Revealing the small proteome of Haloferax volcanii by combining ribosome profiling and small-protein optimized mass spectrometry

Affiliations

Revealing the small proteome of Haloferax volcanii by combining ribosome profiling and small-protein optimized mass spectrometry

Lydia Hadjeras et al. Microlife. .

Abstract

In contrast to extensively studied prokaryotic 'small' transcriptomes (encompassing all small noncoding RNAs), small proteomes (here defined as including proteins ≤70 aa) are only now entering the limelight. The absence of a complete small protein catalogue in most prokaryotes precludes our understanding of how these molecules affect physiology. So far, archaeal genomes have not yet been analyzed broadly with a dedicated focus on small proteins. Here, we present a combinatorial approach, integrating experimental data from small protein-optimized mass spectrometry (MS) and ribosome profiling (Ribo-seq), to generate a high confidence inventory of small proteins in the model archaeon Haloferax volcanii. We demonstrate by MS and Ribo-seq that 67% of the 317 annotated small open reading frames (sORFs) are translated under standard growth conditions. Furthermore, annotation-independent analysis of Ribo-seq data showed ribosomal engagement for 47 novel sORFs in intergenic regions. A total of seven of these were also detected by proteomics, in addition to an eighth novel small protein solely identified by MS. We also provide independent experimental evidence in vivo for the translation of 12 sORFs (annotated and novel) using epitope tagging and western blotting, underlining the validity of our identification scheme. Several novel sORFs are conserved in Haloferax species and might have important functions. Based on our findings, we conclude that the small proteome of H. volcanii is larger than previously appreciated, and that combining MS with Ribo-seq is a powerful approach for the discovery of novel small protein coding genes in archaea.

Keywords: Haloferax volcanii; Ribo-seq; archaea; mass spectrometry; proteomics; ribosome profiling; sORF; small protein; small proteome; sprotein.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Figure 1.
Figure 1.
MS-based proteomic detection of small proteins in H. volcanii. (A). We adjusted a standard proteomics workflow to optimize the detection of small proteins. Small proteins were enriched on a solid-phase column and either digested with Lys-C instead of trypsin or were directly measured by LC-MS/MS (liquid chromatography-tandem mass spectrometry). Small proteins were detected using both a classical database search aimed at detection of annotated small proteins and a proteogenomics search strategy to reveal novel candidates. For MS analysis, Haloferax cells grown to exponential and stationary phase were analyzed. Higher-energy collisional dissociation (HCD) was used to fragment the peptides. (B). Venn-diagram showing the overlap of detected, published and annotated sORFs. The number of annotated sORFs (outer circle) and MS-identified small proteins detected by our adapted small protein MS (purple) and previous datasets [ArcPP (rose) and Jevtić et al. (mustard) (Jevtić et al. , Schulze et al. 2020)] is shown. (C). The number of proteins (left axis) for different protein length bins is compared between annotated proteins (red), small protein adapted MS (dark grey; data from this study), and standard MS [light grey; exemplified by Jevtić et al. dataset (Jevtić et al. 2019)]. Proteome coverage (right axis) achieved across length bins by this small protein adapted MS (blue) opposed to nonadjusted MS approaches [lilac; exemplified by Jevtić et al. dataset; (Jevtić et al. 2019)] is shown.
Figure 2.
Figure 2.
Ribosome profiling distinguishes between H. volcanii coding and noncoding transcripts. (A). The setup of Ribo-seq to map the translatome of H. volcanii. Translating ribosomes (polysomes) were first captured on mRNAs by fast chilling and subsequently digested to monosomes by either MNase or RNase I treatment. Approximately, 30 nt footprints protected from digestion and copurifying with ribosomes were then subjected to cDNA library preparation and deep sequencing. A second library was generated from total RNA for standard RNA-seq. (B). Scatter plot showing global translation efficiencies computed from all H. volcanii Ribo-seq datasets for all annotated coding sequences (CDS; 4107), five selected abundant noncoding RNAs (ncRNAs; RNase P RNA, SRP RNA, and three CRISPR RNAs), annotated sORFs and the annotated sORFs that were detected as translated (after filtering and visual inspection) by Ribo-seq (205 sORFs). The blue lines indicate the mean TE for each gene class for all replicates of MNase and RNase I libraries. (C). Coverage for the RNase P RNA gene (HVO_1802R) is mostly restricted to the RNA-seq library (black track), confirming that the RNase P RNA is not translated. Ribo-seq coverage is shown in blue (obtained with RNase I). (D). A leaderless sORF (HVO_0196, uncharacterized protein, 55 aa) detected by MS was also identified as translated based on Ribo-seq data (coverage shown for Ribo-seq library obtained with RNase I digest). (E). Comparison of RNA trimming by MNase and RNase I. Read coverage for two leadered genes (HVO_1080 and HVO_1072) in the RNA-seq library (black track) and Ribo-seq libraries obtained with MNase (blue track) and RNase I (green track). The genomic position is indicated for the genes shown in panels (C), (D), and (E) at the bottom alongside a schematic representation of the genomic region (relevant genes in black). Arrows indicate the transcription start sites [TSS based on Babski et al. (2016)].
Figure 3.
Figure 3.
Translation of the H. volcanii annotated small proteome revealed by Ribo-seq. (A). Overlap between annotated sORFs (‘annotated sORFs’, green) and translated small proteins detected by MS (all MS datasets, labelled ‘proteomics’; blue) and Ribo-seq (yellow). (B). Length distribution (in codons) of annotated small proteins identified by proteomics (all datasets; grey) and Ribo-seq (blue). (C). Comparison of sORFs detected in Ribo-seq data by manual labelling (‘manual’; green) or different automated ORF prediction tools for Ribo-seq data: REPARATION (yellow) and DeepRibo (orange); IRSOM (only RNA-seq data) (blue). (D). In vivo validation of translation for five annotated small proteins identified either by both Ribo-seq and MS (HVO_1796, 46 aa; HVO_A0348A, 63 aa; HVO_2400, 58 aa, and HVO_1599, 49 aa) or only by Ribo-seq (HVO_A0249A, 34 aa). ORFs were tagged at their N- or C-terminus with a 3xFLAG epitope and expressed under a p.tna promoter (Allers et al. 2010) or natural promoter (HVO_A0348A) from a plasmid in H. volcanii. Strains were grown to exponential phase in selective media and protein extracts were analyzed by western blotting with an anti-FLAG antibody. Analysis of a nontranslated sORF served as negative control (Figure S5D, Supporting Information). M: molecular weight marker, sizes are shown in kDa. Top: Ribo-seq (blue) and RNA-seq (black) coverage, genomic position is indicated below with a schematic representation of the genomic region (black: sORF investigated). Bent arrows indicate the transcription start sites [TSS based on Babski, Haas et al. (2016)].
Figure 4.
Figure 4.
Ribo-seq combined with MS expand the H. volcanii small proteome. (A). Overlap between the novel small proteins detected by Ribo-seq (blue) and MS (grey). (B). Length distribution (in codons) of the novel small proteins identified by Ribo-seq. (C). and (D). In vivo validation of translation for five novel small proteins identified either by both Ribo-seq and MS (sORF8, 56 aa; sORF10, 40 aa, and sORF13, 45 aa; top panel) (C) or only by Ribo-seq (sORF46, 42 aa and sORF47, 23 aa; bottom panel) (D). Small ORFs were C-terminally fused to a 3xFLAG tag and expressed from a plasmid in H. volcanii. Strains were grown to exponential phase in selective media and protein extracts were analyzed by western blotting. Proteins were detected with an anti-FLAG antibody. A nontranslated sORF served as negative control (Figure S5D, Supporting Information). M: molecular weight marker. Top: genome browser screenshots of read coverage from Ribo-seq(blue track)/RNA-seq(black track) libraries. Genomic positions are indicated below with a schematic representation of the genomic region (novel sORFs in black). Bent arrows indicate the transcription start sites (TSS) based on Babski et al. (2016).
Figure 5.
Figure 5.
Genomic distribution of H. volcanii translated small proteins. Data from MS and Ribo-seq were used to display the expanded proteome. The outer rings indicate all currently annotated ORFs (light grey) and % GC (dark grey). Black: all annotated sORFs; dark blue: all annotated translated sORFs; and purple: novel translated sORFs. The main chromosome as well as the three minichromosomes pHV1, pHV3, and pHV4 are shown.
Figure 6.
Figure 6.
Features identified for the H. volcanii translated small proteome. (A). Pie chart indicating the proportion of leaderless (red) and leadered (blue) sORFs in the 212 translated annotated (left) and 48 novel (right) sORFs. ND: not determined. (B). Genomic location of translated annotated (left) and novel (right) sORFs relative to currently annotated genes. (C). Conservation of the 48 translated novel sORFs was determined for the genus Haloferax using tblastn (blue; depth denotes % conservation). The gradient on the right side indicates the % identity at the amino acid level. For comparison, three ribosomal proteins (HVO_2550, HVO_2475, and HVO_0700; far right) are included. Detection by blastp and MS are indicated at the bottom (red; depth indicates recovered hits). Legend at the bottom: ‘yes’: a 100% match was found to an annotated protein; ‘partial’: parts of the protein sequence match to an annotated protein, ‘no’: no matches were found for the sORF. (D). Annotated as well as predicted function (using Phyre2) for the 212 translated annotated sORFs. (E). Cellular localization predicted using PSORTb for the 48 translated novel sORFs in H. volcanii.

Similar articles

Cited by

References

    1. Ahrens CH, Wade JT, Champion MMet al. . A practical guide to small protein discovery and characterization using mass spectrometry. J Bacteriol. 2022;204:e0035321. - PMC - PubMed
    1. Allers T, Barak S, Liddell Set al. . Improved strains and plasmid vectors for conditional overexpression of his-tagged proteins in Haloferax volcanii. Appl Environ Microbiol. 2010;76:1759–69. - PMC - PubMed
    1. Allers T, Ngo HP, Mevarech Met al. . Development of additional selectable markers for the halophilic archaeon Haloferax volcanii based on the leuB and trpA genes. Appl Environ Microbiol. 2004;70:943–53. - PMC - PubMed
    1. Altschul SF, Madden TL, Schaffer AAet al. . Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–402. - PMC - PubMed
    1. Babski J, Haas KA, Näther-Schindler Det al. . Genome-wide identification of transcriptional start sites in the haloarchaeon Haloferax volcanii based on differential RNA-seq (dRNA-Seq). Bmc Genomics. 2016;17:629. - PMC - PubMed