Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jun 21;13(1):151.
doi: 10.1186/s40168-025-02145-3.

Uncovering encrypted antimicrobial peptides in health-associated Lactobacillaceae by large-scale genomics and machine learning

Affiliations

Uncovering encrypted antimicrobial peptides in health-associated Lactobacillaceae by large-scale genomics and machine learning

Rubing Du et al. Microbiome. .

Abstract

Background: Antimicrobial peptides (AMPs) are well known for their broad-spectrum activity and have shown great promise in addressing the antibiotic-resistant crisis. The Lactobacillaceae family, recognized for its health-promoting effects in humans, represents a valuable source of novel AMPs. However, the global prevalence and distribution of AMPs within Lactobacillaceae remains largely unknown, which limits the efficient discovery and development of novel AMPs.

Results: We analyzed all available genomes (10,327 genomes), encompassing 38 genera and 515 species, to investigate the biosynthetic potential (indicated by the number of AMP sequences in the genome) of AMP in the Lactobacillaceae family. We demonstrated Lactobacillaceae species had ubiquitous (69.90%) biosynthetic potential of AMPs. Overall, 9601 AMPs were identified, clustering into 2092 gene cluster families (GCFs), which showed strong interspecies specificity (95.27%), intraspecies heterogeneity (93.31%), and habitat uniqueness (95.83%), that greatly expanded on the AMP sequence landscape. Novelty assessment indicated that 1516 GCFs (72.47%) had no similarity to any known AMPs in existing databases. Machine learning predictions suggested that novel AMPs from Lactobacillaceae possessed strong antimicrobial potential, with 664 GCFs having an additive minimum inhibitory concentration (MIC) below 100 μM. We randomly synthesized 16 AMPs (with predicted MIC < 100 μM) and identified 10 AMPs exhibiting varied-spectrum activity against 11 common pathogens. Finally, we identified one Lactobacillus delbrueckii-originated AMP (delbruin_1) having broad-spectrum (all 11 pathogens) and high antimicrobial activity (average MIC = 38.56 µM), which proved its potential as a clinically viable antimicrobial agent.

Conclusions: We uncovered the global prevalence of AMPs in Lactobacillaceae and proved that Lactobacillaceae is an untapped and invaluable source of novel AMPs to combat the antibiotic-resistance crisis. Meanwhile, we provided a machine learning-guided framework for AMP discovery, offering a scalable roadmap for identifying novel AMPs not only in Lactobacillaceae but also in other organisms. Video Abstract.

Keywords: Lactobacillaceae; Antibiotic resistance; Antimicrobial peptides; Genome mining; Machine learning.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: Not applicable. Consent for publication: Not applicable. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
The biosynthetic potential of AMP in Lactobacillaceae species. A Genome information and AMP biosynthetic potential in Lactobacillaceae from different datasets. The pie chart on the left represents the source of 10,327 genomes. The Venn diagram in the middle represents the distribution of species across different sources. The stacking diagram on the right represents the biosynthetic potential of AMP in genomes from different sources. B The phylogenetic distribution of AMP biosynthetic potential among 515 Lactobacillaceae species. The colors of the tree and bar chart represent different genera. The blue bars in the outermost layer represent the average AMP number in different Lactobacillaceae species
Fig. 2
Fig. 2
Thephylogeneticdistributionof AMP in the Lactobacillaceae family. A The AMP distribution among 33 genera. The central tree represents the hierarchical clustering based on AMP distribution within GCFs. The circular heatmap represents AMP distribution by genus. Black circles and green pentagons indicate genus- and species-specific GCFs, respectively. The top-right curve shows GCF accumulation with increasing genome numbers for the genera with > 100 genomes. B Strain heterogeneity of AMP biosynthetic potential in the Lactobacillaceae family. Boxplots show the coefficient of variation (CV) of AMP counts among strains within the same species. The heatmap displays average genome count, intraspecies AMP coverage, and interspecies AMP abundance per species. The data in the heatmap represent the average value
Fig. 3
Fig. 3
The divergence of Lactobacillaceae AMPs from different ecosystems. A The AMP biosynthetic potential of Lactobacillus species from three ecosystems. B Venn diagram of 599 GCFs from different ecosystems. Circles represent GCFs. C Distribution of 1170 habitat-specific AMPs from multi-habitat and habitat-specific species. Red and blue triangles represent AMPs in multi-habitat and habitat-specific species, respectively. The circles and squares represent multi-habitat and habitat-specific species, respectively. The size of the circles and squares represents the number of AMPs in each species. Genera with fewer than 5 AMPs are combined and shown as “others.” Red lines represent that the taxonomic origins of AMPs are multi-habitat species. Blue lines represent that the taxonomic origins of AMPs are habitat-specific species
Fig. 4
Fig. 4
The uniqueness of Lactobacillaceae AMPs compared to AMPs from other origins. A Amino acid frequency in Lactobacillaceae-originated AMPs compared with AMPs from AEP, Human, AMPSphere, and public databases (DRAMP, DBAASP, and APD). B The number of unique AMPs in Lactobacillaceae compared with those in other organisms. The data in the center of the Venn diagram represent the number of shared AMPs between the two datasets. Comparative analysis of net charge (C), normalized hydrophobicity (D), isoelectric point (E), and Boman index (F) between AMPs from Lactobacillaceae and other organisms. *P < 0.05, ***P < 0.001. Cohen’s d represents effect size
Fig. 5
Fig. 5
Prediction and validation of antimicrobial activity of AMPs. A Radar chart showing the predicted MIC values of 16 AMPs. The scale in the radar chart represents predicted MICs (μM). B In vitro validation of MICs for 16 AMPs. The heatmap represents the MICs of AMPs against different pathogens. C 3D structures of 10 AMPs with antimicrobial activity. Their sequences were shown above each structure. D Cytotoxicity of 10 AMPs with antimicrobial activity and LL-37. The testing concentration for each AMP corresponds to its maximum MIC (acidin_1: 137.01 μM, confusin_1: 12.81 μM, delbruin_1: 172.70 μM, lacspin_1: 29.97 μM, ligspin_1: 174.99 μM, parabuchin_1:158.36 μM, paracasein_1: 67.99 μM, ruminis_1: 132.19 μM, stilesiin_1: 145.48 μM, yiduin_1: 166.05 μM, LL-37: 89.02 μM)

Similar articles

References

    1. Pandi A, Adam D, Zare A, Trinh VT, Schaefer SL, Burt M, et al. Cell-free biosynthesis combined with deep learning accelerates de novo-development of antimicrobial peptides. Nat Commun. 2023;14(1):7197. - PMC - PubMed
    1. Murray C, Ikuta K, Sharara F, Swetschinski L, Aguilar G, Gray A, et al. Global burden of bacterial antimicrobial resistance in 2019: a systematic analysis. Lancet. 2022;399:629–55. - PMC - PubMed
    1. Miethke M, Pieroni M, Weber T, Bronstrup M, Hammann P, Halby L, et al. Towards the sustainable discovery and development of new antibiotics. Nat Rev Chem. 2021;5(10):726–49. - PubMed
    1. Mantri SS, Mantri SS, Negri T, Sales-Ortells H, Angelov A, Peter S, Neidhardt H, et al. Metagenomic sequencing of multiple soil horizons and sites in close vicinity revealed novel secondary metabolite diversity. mSystems. 2021;6(5):e01018-21. - PMC - PubMed
    1. Boufridi A, Quinn RJ. Harnessing the properties of natural products. Annu Rev Pharmacol. 2018;58:451–70. - PubMed

Substances

LinkOut - more resources