Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jan 18;204(1):e0035321.
doi: 10.1128/JB.00353-21. Epub 2021 Nov 8.

A Practical Guide to Small Protein Discovery and Characterization Using Mass Spectrometry

Affiliations

A Practical Guide to Small Protein Discovery and Characterization Using Mass Spectrometry

Christian H Ahrens et al. J Bacteriol. .

Abstract

Small proteins of up to ∼50 amino acids are an abundant class of biomolecules across all domains of life. Yet due to the challenges inherent in their size, they are often missed in genome annotations, and are difficult to identify and characterize using standard experimental approaches. Consequently, we still know few small proteins even in well-studied prokaryotic model organisms. Mass spectrometry (MS) has great potential for the discovery, validation, and functional characterization of small proteins. However, standard MS approaches are poorly suited to the identification of both known and novel small proteins due to limitations at each step of a typical proteomics workflow, i.e., sample preparation, protease digestion, liquid chromatography, MS data acquisition, and data analysis. Here, we outline the major MS-based workflows and bioinformatic pipelines used for small protein discovery and validation. Special emphasis is placed on highlighting the adjustments required to improve detection and data quality for small proteins. We discuss both the unbiased detection of small proteins and the targeted analysis of small proteins of interest. Finally, we provide guidelines to prioritize novel small proteins, and an outlook on methods with particular potential to further improve comprehensive discovery and characterization of small proteins.

Keywords: LC-MS/MS; SEP; genome annotation; microprotein; proteomics; sample preparation; shotgun proteomics; small protein; sproteins; top-down proteomics.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

FIG 1
FIG 1
Overview of the main mass spectrometry-based workflows for small protein discovery, analysis, and characterization. The large majority of studies have relied on a shotgun proteomics discovery approach (bottom-up) to identify small proteins. Top-down approaches are slowly gaining momentum but are not yet widely accessible from core facilities. Bioinformatics is important to assemble complete genomes de novo (at times using genomic DNA extracted from the same sample), to integrate small protein predictions with experimental RNA-seq and Ribo-seq data to create custom databases that allow the identification of novel small proteins by MS-based proteomics. Validation and prioritization facilitate focusing on the elucidation of function(s) of the most promising novel small proteins (yellow shading; see asterisk), an aspect that is described in more detail in the accompanying article “Small Proteins; Big Questions” (124). Shading matches that in Fig. 2. Corresponding text sections are indicated by white circles, as follows: 1B, “Sample Preparation and Data Collection—Preparation and enrichment for small proteins”; 1C, “Sample Preparation and Data Collection—Protease Digestion”; 1D, “Sample Preparation and Data Collection—Liquid chromatography”; 1E, “Sample Preparation and Data Collection—Ionization and data acquisition” ; 2A, “Data Analysis—Overview: the relevance of genome sequences for proteogenomics”; 2B, “Data Analysis—Creation of custom search databases”; 2C, “Data Analysis—Stringent FDR control”; 3, “Validation of Novel Small Protein Candidates”; 4, “Prioritization/Selection of Novel Small Proteins.”
FIG 2
FIG 2
Overview of the major steps of the most common MS-based workflows for discovery/identification of small proteins, their targeted analysis (for quantification), and for the functional characterization of novel and known small proteins. The numbering of the steps is aligned with Fig. 1, with corresponding text sections indicated by white circles, as follows: 1B, “Sample Preparation and Data Collection—Preparation and enrichment for small proteins”; 1C, “Sample Preparation and Data Collection—Protease digestion”; 1D, “Sample Preparation and Data Collection—Liquid chromatography”; 1E, “Sample Preparation and Data Collection—Ionization and data acquisition”; 2A, “Data Analysis—Overview: the relevance of genome sequences for proteogenomics”; 2B, “Data Analysis—Creation of custom search databases”; 2C, “Data Analysis—Stringent FDR control”; 3, “Validation of Novel Small Protein Candidates”; 4, “Prioritization/Selection of Novel Small Proteins.” Alternative approaches are listed and selected references provided.

References

    1. Sberro H, Fremin BJ, Zlitni S, Edfors F, Greenfield N, Snyder MP, Pavlopoulos GA, Kyrpides NC, Bhatt AS. 2019. Large-scale analyses of human microbiomes reveal thousands of small, novel genes. Cell 178:1245–1259.e14. 10.1016/j.cell.2019.07.016. - DOI - PMC - PubMed
    1. Wang Z, Gerstein M, Snyder M. 2009. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10:57–63. 10.1038/nrg2484. - DOI - PMC - PubMed
    1. Ingolia NT, Ghaemmaghami S, Newman JR, Weissman JS. 2009. Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science 324:218–223. 10.1126/science.1168978. - DOI - PMC - PubMed
    1. Vazquez-Laslop N, Sharma C, Mankin A, Buskirk A. 2021. Identifying small ORFs in prokaryotes with ribosome profiling. J Bacteriol 10.1128/JB.00294-21. - DOI - PMC - PubMed
    1. Sharma CM, Hoffmann S, Darfeuille F, Reignier J, Findeiss S, Sittka A, Chabas S, Reiche K, Hackermüller J, Reinhardt R, Stadler PF, Vogel J. 2010. The primary transcriptome of the major human pathogen Helicobacter pylori. Nature 464:250–255. 10.1038/nature08756. - DOI - PubMed

Publication types

MeSH terms

LinkOut - more resources