Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May 14:3:uqac005.
doi: 10.1093/femsml/uqac005. eCollection 2022.

Hidden in plain sight: challenges in proteomics detection of small ORF-encoded polypeptides

Affiliations

Hidden in plain sight: challenges in proteomics detection of small ORF-encoded polypeptides

Igor Fijalkowski et al. Microlife. .

Abstract

Genomic studies of bacteria have long pointed toward widespread prevalence of small open reading frames (sORFs) encoding for short proteins, <100 amino acids in length. Despite the mounting genomic evidence of their robust expression, relatively little progress has been made in their mass spectrometry-based detection and various blanket statements have been used to explain this observed discrepancy. In this study, we provide a large-scale riboproteogenomics investigation of the challenging nature of proteomic detection of such small proteins as informed by conditional translation data. A panel of physiochemical properties alongside recently developed mass spectrometry detectability metrics was interrogated to provide a comprehensive evidence-based assessment of sORF-encoded polypeptide (SEP) detectability. Moreover, a large-scale proteomics and translatomics compendium of proteins produced by Salmonella Typhimurium (S. Typhimurium), a model human pathogen, across a panel of growth conditions is presented and used in support of our in silico SEP detectability analysis. This integrative approach is used to provide a data-driven census of small proteins expressed by S. Typhimurium across growth phases and infection-relevant conditions. Taken together, our study pinpoints current limitations in proteomics-based detection of novel small proteins currently missing from bacterial genome annotations.

Keywords: Salmonella Typhimurium; in silico proteomics; proteomics; riboproteogenomics; sORF; sORF-encoded polypeptides (SEPs).

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Detectability of peptides produced by in silico tryptic digestion of the annotated S. Typhimurium proteome. (A) Number of S. Typhimurium proteins in function of their protein length. (B) Distribution of detectability tryptic peptide scores in function of protein length. Three detectability classes are distinguished (high, >0.9; mid, 0.9–0.45; and low, <0.45). Peptides with proteomic support (minimum two PSMs) are indicated in red. (C)Composition of peptide detectability classes in function of protein length. (D) Peptide detectability classes with proteomic support in function of protein length. (E)Numbers of unique peptides in distinct peptide detectability classes for SEPs versus all other proteins.
Figure 2.
Figure 2.
Properties of peptides derived from small proteins. (A) Peptide detectability coverage plot of the eight SL1344 annotated SEPs smaller than 25 aa in length, with color representing the detectability score of a peptide. The peptide coverage plots indicate the peptides of the highest detectability at each position. (B) Detectability distribution for all unique theoretical tryptic peptides originating from annotated proteins, theoretical unique tryptic peptides originating from novel proteogenomics-detected proteins and unique peptides identified by proteomics in this study.
Figure 3.
Figure 3.
Protein Ribo-seq expression levels in function of protein length. (A) Log2-transformed RPKM values of all annotated proteins smaller than 600 aa. (B) Box plot depicting distribution of translation values per protein length bin. (C) Abundance of annotated proteins detected in the proteomics experiments as measured by log2(iBAQ) values in function of protein length. (D) Correlation between proteomics and translation as measured by ribosome profiling in MEP (OD 0.3 in LB medium). The protein length is marked on a color scale with proteins smaller than 100 aa indicated with triangles. Pearson correlation coefficients are marked in respective graphs. The robustly expressed RpmH SEP described in the text is highlighted. The quantification of RpmH is highlighted as an example of robustly expressed SEP (see manuscript text for details).
Figure 4.
Figure 4.
Protein hydrophobicity of the annotated S. Typhimurium proteome. (A) Distribution of protein hydrophobicity scores in function of protein length. Proteins detected in the proteomics experiments were highlighted in color in function of the number of identified UTPs. (B) Box plot depicting distribution of hydrophobicity scores per protein length bin. (C) Box plot depicting distribution of hydrophobicity scores of proteomics identified proteins per protein length bin.(D)Histogram of hydrophobicity scores of proteins for all annotated proteins (left) and for SEPs (right). (E)Domain composition of all proteins (left) and SEPs (right) in the S. Typhimurium proteome.
Figure 5.
Figure 5.
Riboproteogenomics reannotation of the S. Typhimurium genome. (A) Chromosomal position, length and domain composition of newly Ribo-seq called intergenic or (partially) overlapping ORFs. (B) Peptide detectability coverage plots for Ribo-seq predicted SEPs with the proteomics identified peptides highlighted in red. Detectability of peptides is coded on a color scale.
Figure 6.
Figure 6.
Newly discovered SEPs with supporting proteomics evidence. Novel SEPs are presented with matching examples of assigned peptide fragmentation spectra.
Figure 7.
Figure 7.
Expression regulation of unannotated intergenic sORFs at the level of translation (measured by Ribo-seq). Condition-specific expression of novel intergenic ORFs significantly differentially expressed between the MEP (OD 0.3 in LB medium) condition and other growth conditions investigated. Only ANOVA significant regulation is presented with log2 fold change in color scale. Novel sORFs regulated in SPI2-inducing conditions have been highlighted in orange.

Similar articles

Cited by

References

    1. Adams PP, Baniulyte G, Esnault Cet al. . Regulatory roles of Escherichia coli 5′ UTR and ORF-internal RNAs detected by 3′ end mapping. eLife. 2021;10:e62438. - PMC - PubMed
    1. Baek J, Lee J, Yoon Ket al. . Identification of unannotated small genes in Salmonella. G3 (Bethesda). 2017;7:983–9. - PMC - PubMed
    1. Bartel J, Varadarajan AR, Sura Tet al. . Optimized proteomics workflow for the detection of small proteins. J Proteome Res. 2020;19:4004–18. - PubMed
    1. Bonissone S, Gupta N, Romine Met al. . N-terminal protein processing: a comparative proteogenomic analysis. Mol Cell Proteomics. 2013;12:14–28. - PMC - PubMed
    1. Chen C-W, Lin M-H, Liao C-Cet al. . iStable 2.0: predicting protein thermal stability changes by integrating various characteristic modules. Comput Struct Biotechnol J. 2020;18:622–30. - PMC - PubMed