Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul 22:13:935351.
doi: 10.3389/fgene.2022.935351. eCollection 2022.

Profiling a Community-Specific Function Landscape for Bacterial Peptides Through Protein-Level Meta-Assembly and Machine Learning

Affiliations

Profiling a Community-Specific Function Landscape for Bacterial Peptides Through Protein-Level Meta-Assembly and Machine Learning

Mitra Vajjala et al. Front Genet. .

Abstract

Small proteins, encoded by small open reading frames, are only beginning to emerge with the current advancement of omics technology and bioinformatics. There is increasing evidence that small proteins play roles in diverse critical biological functions, such as adjusting cellular metabolism, regulating other protein activities, controlling cell cycles, and affecting disease physiology. In prokaryotes such as bacteria, the small proteins are largely unexplored for their sequence space and functional groups. For most bacterial species from a natural community, the sample cannot be easily isolated or cultured, and the bacterial peptides must be better characterized in a metagenomic manner. The bacterial peptides identified from metagenomic samples can not only enrich the pool of small proteins but can also reveal the community-specific microbe ecology information from a small protein perspective. In this study, metaBP (Bacterial Peptides for metagenomic sample) has been developed as a comprehensive toolkit to explore the small protein universe from metagenomic samples. It takes raw sequencing reads as input, performs protein-level meta-assembly, and computes bacterial peptide homolog groups with sample-specific mutations. The metaBP also integrates general protein annotation tools as well as our small protein-specific machine learning module metaBP-ML to construct a full landscape for bacterial peptides. The metaBP-ML shows advantages for discovering functions of bacterial peptides in a microbial community and increases the yields of annotations by up to five folds. The metaBP toolkit demonstrates its novelty in adopting the protein-level assembly to discover small proteins, integrating protein-clustering tool in a new and flexible environment of RBiotools, and presenting the first-time small protein landscape by metaBP-ML. Taken together, metaBP (and metaBP-ML) can profile functional bacterial peptides from metagenomic samples with potential diverse mutations, in order to depict a unique landscape of small proteins from a microbial community.

Keywords: bacterial peptide; machine learning; metagenomics; protein annotation; protein clustering.

PubMed Disclaimer

Conflict of interest statement

ML is affiliated with Nashville Biosciences. His development of the RBiotools package was done independently of any funding from Nashville Biosciences or from any other commercial funding source. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Flowchart of metaBP pipeline. MetaBP’s implementation consists of three major modules (metaBP, metaBP-ML, and RBiotools), and five main procedures (protein meta-assembly, protein clustering, mutation calling, protein embedding, and protein annotation).
FIGURE 2
FIGURE 2
Small protein quantification in each sample. (A) Normalized counts (small proteins per million) for top ten EC numbers comparing the high-fat diet mice with the normal mice after 12 weeks. (B) Protein copy numbers for 4 known small genes recovered from the samples.
FIGURE 3
FIGURE 3
Comparison of small protein annotations from eggNOG and metaBP-ML. (A) Number of proteins can be annotated with EC numbers by eggNOG and metaBP-ML. A total of 6,865 proteins in dashed edge circle are annotated with exactly the same EC numbers. (B) Top 11 EC numbers predicted from eggNOG (C) Top 11 EC numbers predicted from metaBP-ML. (D) Number of proteins can be annotated with taxonomy terms by eggNOG and metaBP-ML. A total of 4,198 proteins in the dashed edge circle are annotated with exactly the same family names. (E) Top 11 taxonomy terms predicted from eggNOG. (F) Top 11 taxonomy terms predicted by metaBP-ML.
FIGURE 4
FIGURE 4
Landscape for small proteins. (A) The database with 29-known small proteins overlaid. (B) The zoomed-in display for 29-known small proteins in the two-dimensional space. (C) The different mice gut samples overlay with the database landscape. (D) The human and environmental samples overlay with the database landscape.
FIGURE 5
FIGURE 5
Sequence diversity of senS gene. (A) Sequence alignment and conservation of the senS proteins. (B) The senS cluster and ten neighbors overlay onto the database landscape. (C) The predicted structure for the consensus sequence of senS. (D) The predicted structure for a mutant of the consensus.

Similar articles

Cited by

References

    1. Bateman A., Martin M. J., Orchard S., Magrane M., Agivetova R., Ahmad S., et al. (2021). UniProt: The Universal Protein Knowledgebase in 2021. Nucleic Acids Res. 49, D480. 10.1093/nar/gkaa1100 - DOI - PMC - PubMed
    1. Bushnell B., Rood J., Singer E. (2017). BBMerge - Accurate Paired Shotgun Read Merging via Overlap. PLoS One 12, e0185056. 10.1371/journal.pone.0185056 - DOI - PMC - PubMed
    1. Cantalapiedra C. P., Hernández-Plaza A., Letunic I., Bork P., Huerta-Cepas J. (2021). eggNOG-Mapper V2: Functional Annotation, Orthology Assignments, and Domain Prediction at the Metagenomic Scale. Mol. Biol. Evol. 38, 5825–5829. 10.1093/molbev/msab293 - DOI - PMC - PubMed
    1. Chen J., Brunner A. D., Cogan J. Z., Nuñez J. K., Fields A. P., Adamson B., et al. (2020). Pervasive Functional Translation of Noncanonical Human Open Reading Frames. Science 367, 1140–1146. 10.1126/science.aay0262 - DOI - PMC - PubMed
    1. Duval M., Cossart P. (2017). Small Bacterial and Phagic Proteins: An Updated View on a Rapidly Moving Field. Curr. Opin. Microbiol. 39, 81–88. 10.1016/j.mib.2017.09.010 - DOI - PubMed

LinkOut - more resources