Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2023 Sep 11:2023.08.31.555663.
doi: 10.1101/2023.08.31.555663.

Computational exploration of the global microbiome for antibiotic discovery

Affiliations

Computational exploration of the global microbiome for antibiotic discovery

Célio Dias Santos-Júnior et al. bioRxiv. .

Update in

  • Discovery of antimicrobial peptides in the global microbiome with machine learning.
    Santos-Júnior CD, Torres MDT, Duan Y, Rodríguez Del Río Á, Schmidt TSB, Chong H, Fullam A, Kuhn M, Zhu C, Houseman A, Somborski J, Vines A, Zhao XM, Bork P, Huerta-Cepas J, de la Fuente-Nunez C, Coelho LP. Santos-Júnior CD, et al. Cell. 2024 Jul 11;187(14):3761-3778.e16. doi: 10.1016/j.cell.2024.05.013. Epub 2024 Jun 5. Cell. 2024. PMID: 38843834 Free PMC article.

Abstract

Novel antibiotics are urgently needed to combat the antibiotic-resistance crisis. We present a machine learning-based approach to predict prokaryotic antimicrobial peptides (AMPs) by leveraging a vast dataset of 63,410 metagenomes and 87,920 microbial genomes. This led to the creation of AMPSphere, a comprehensive catalog comprising 863,498 non-redundant peptides, the majority of which were previously unknown. We observed that AMP production varies by habitat, with animal-associated samples displaying the highest proportion of AMPs compared to other habitats. Furthermore, within different human-associated microbiota, strain-level differences were evident. To validate our predictions, we synthesized and experimentally tested 50 AMPs, demonstrating their efficacy against clinically relevant drug-resistant pathogens both in vitro and in vivo. These AMPs exhibited antibacterial activity by targeting the bacterial membrane. Additionally, AMPSphere provides valuable insights into the evolutionary origins of peptides. In conclusion, our approach identified AMP sequences within prokaryotic microbiomes, opening up new avenues for the discovery of antibiotics.

Keywords: antimicrobial activity; antimicrobial peptides; global microbiome; machine learning; metagenomics.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests Cesar de la Fuente-Nunez provides consulting services to Invaio Sciences and is a member of the Scientific Advisory Boards of Nowture S.L. and Phare Bio. The de la Fuente Lab has received research funding or in-kind donations from United Therapeutics, Strata Manufacturing PJSC, and Procter & Gamble, none of which were used in support of this work. All other authors state they do not have any competing interests.

Figures

Figure 1.
Figure 1.. AMPSphere comprises 836,498 non-redundant c_AMPs from thousands of metagenomes and high-quality microbial genomes.
(A) To build AMPSphere, we first assembled 63,410 publicly available metagenomes from diverse habitats. A modified version of Prodigal, which can also predict smORFs (30–300 bp), was used to predict genes on the resulting metagenomic contigs as well as on 87,920 microbial genomes from ProGenomes2. Macrel was applied to the 4,599,187,424 predicted smORFs to obtain 863,498 non-redundant c_AMPs (see also Fig. SI1). c_AMPs were then hierarchically clustered in a reduced amino acids alphabet using 100%, 85%, and 75% identity cutoffs. We observed at 75% of identity 118,051 non-singleton clusters, and 8,788 of them were considered families (≥ 8 c_AMPs). (B) Only 9% of c_AMPs have detectable homologs in other peptides (SmProt 2, DRAMP 3.0, starPepDB 45k, STsORFs) and general protein datasets (GMGCv1) - see also Fig. SI2B. (C) AMP discovery is impacted by the sampling, with most of the habitats presenting steep sampling curves, e.g., soil. (D) Overall, c_AMPs are habitat-specific - see also Fig. SI2C–D and Tables SI1 and SI2.
Figure 2.
Figure 2.. Mutations in genes encoding large proteins generate c_AMPs as independent genomic entities.
(A) About 7% of c_AMPs are homologous to proteins from GMGCv1, with almost one-fourth of the hits sharing start positions with the larger protein. (B) As an illustrative example, AMP10.271_016 was recovered in three samples of human saliva from the same donor. AMP10.271_016 is predicted to be produced by Prevotella jejuni, sharing the start codon (bolded) of an NAD(P)-dependent dehydrogenase gene (WP_089365220.1), the transcription of which was stopped by a mutation (in red; TGG > TGA). (C) The OGs of unknown function represent the largest (2,041 out of 3,792 OGs) and most enriched PKruskal=2.661039 class with homologs to c_AMPs in GMGCv1. Interestingly, when considered individually, the number of c_AMP hits to unknown OGs was the lowest PKruskal=6103. These results do not change when underrepresented OGs are excluded by using different thresholds (e.g., at least 10, 20, or 100 homologs per OG) - see also Table SI3.
Figure 3.
Figure 3.. The genome context of c_AMPs shows a preference for neighborhoods containing ABC transporters and ribosome assembly proteins - see Tables SI4 and SI5.
(A) Compared to other proteins, c_AMPs tend to be closer to ABC transporters and ribosomal machinery-related genes than families of proteins with different sizes (≤ 50 amino acids and all lengths). (B) The proportion of c_AMPs in a genome context involving antibiotic resistance genes is lower than families of proteins shorter than 50 amino acids and, in the case of CAMP and vancomycin resistance, than all proteins. (C) The proportion of c_AMPs in neighborhoods with antibiotic synthesis-related genes is very small (<0.25%). (D) AMP10.015_426 is an example of a c_AMP homologous to the ribosomal protein rpsH, found in the context of other ribosomal protein genes.
Figure 4.
Figure 4.. AMP variation is taxonomy-dependent.
(A) Most of c_AMPs and families from ProGenomes2 are classified as core genes (defined as ≥95% genomes within a species). (B) The majority of the c_AMPs were classified down to the level of genus and species. Animal-associated genera (e.g. Prevotella, Faecalibacterium, CAG-110) contribute the most c_AMPs, possibly reflecting data sampling. (C) Using the ρAMP per genus, we observed the distribution of c_AMPs per phyla, with Bacillota A as the densest. (D) The ρAMP distribution (gray bars, confidence interval of 95% shown as black bars) with respect to taxonomy shows Bacillota A, Actinomycetota, and Pseudomonadota as the densest phyla in c_AMPs. As a reference, the median of ρAMP for the presented genera is indicated by a magenta dashed line (see Tables SI7 and SI8).
Figure 5.
Figure 5.. Habitats differ in their c_AMP densities, with differences being observed between conspecific strains.
(A) Host-associated samples presented a higher ρAMP (calculated at a genus and samples levels) than samples from environmental samples (a random sample of 1 thousand dots for each group was drawn excluding outliers – see Differences in the c_AMP density in microbial species from different habitats in Methods). (B) Prevotella copri has a higher ρAMP in cat and human guts compared to the same species in the guts of pigs and dogs. 106 randomly selected points are shown for each host. (C) Investigating the species-specific ρAMP of microbes found in samples from the human gut and human oral cavity, we observed 34 out of the 37 tested species presenting a higher c_AMP density in the gut. (D) The effect of a host when it is not an animal was investigated by verifying the species-specific ρAMP of microbes happening in samples from soil and plants. We observed 85 out of the 130 tested species presenting a higher density in soils. For panels C and D the significance was color-encoded using a Log10PMann scale. See also Fig. SI4, Fig. SI5, and Table SI8. Our results also showed that differences in ρAMP observed also were kept even when restricting the c_AMP genes by controlling their quality (Fig. SI5).
Figure 6.
Figure 6.. Amino acid composition, structure, antimicrobial activity, and mechanism of action of c_AMPs.
(A) Amino acid frequency in c_AMPs from AMPSphere, AMPs from databases (DRAMP v3, APD3, and DBAASP), and encrypted peptides (EPs) from the human proteome. (B) Heat map with the percentage of secondary structure found for each peptide in three different solvents: water, 60% trifluoroethanol in water, and 50% methanol in water. Secondary structure was calculated using BeStSel server. (AC) Activity of c_AMPs assessed against ESKAPEE pathogens and human gut commensal strains. Briefly, 106 CFU·mL−1 was exposed to c_AMPs two-fold serially diluted ranging from 64 to 1 μmol·L−1 in 96-wells plates and incubated at 37 °C for one day. After the exposure period, the absorbance of each well was measured at 600 nm. Untreated solutions were used as controls and minimal concentration values for complete inhibition were presented as a heat map of antimicrobial activities (μmol·L−1) against 11 pathogenic and eight human gut commensal bacterial strains. All the assays were performed in three independent replicates and the heatmap shows the mode obtained within the two-fold dilutions concentration range studied. (D) Fluorescence values relative to polymyxin B (PMB, positive control) of the fluorescent probe 1-(N-phenylamino)naphthalene (NPN) that indicate outer membrane permeabilization of A. baumannii ATCC 19606 cells. (E) Fluorescence values relative to PMB (positive control) of 3,3′-dipropylthiadicarbocyanine iodide [DiSC3-(5)], a hydrophobic fluorescent probe, used to indicate cytoplasmic membrane depolarization of A. baumannii ATCC 19606 cells. Depolarization of the cytoplasmic membrane occurred with a slow kinetics compared to the permeabilization of the outer membrane and took approximately 20 min to stabilize.
Figure 7.
Figure 7.. Anti-infective activity of AMPs in pre-clinical animal model.
(A) Schematic of the skin abscess mouse model used to assess the anti-infective activity of the peptides against A. baumannii cells. (B) Peptides were tested at their MIC in a single dose one hour after the establishment of the infection. Each group consisted of three mice n=3 and the bacterial loads used to infect each mouse derived from a different inoculum. (C) To rule out toxic effects of the peptides, mouse weight was monitored throughout the experiment. Statistical significance in (B) was determined using one-way ANOVA where all groups were compared to the untreated control group; P-values are shown for each of the groups. Features on the violin plots represent median and upper and lower quartiles. Data in (C) are the mean ± the standard deviation. Figure created in BioRender.com.

References

    1. Antimicrobial Resistance Collaborators (2022). Global burden of bacterial antimicrobial resistance in 2019: a systematic analysis. Lancet 399, 629–655. 10.1016/S0140-6736(21)02724-0. - DOI - PMC - PubMed
    1. Stokes J.M., Yang K., Swanson K., Jin W., Cubillos-Ruiz A., Donghia N.M., MacNair C.R., French S., Carfrae L.A., Bloom-Ackermann Z., et al. (2020). A Deep Learning Approach to Antibiotic Discovery. Cell 180, 688–702.e13. 10.1016/j.cell.2020.01.021. - DOI - PMC - PubMed
    1. Torres M.D.T., Melo M.C.R., Flowers L., Crescenzi O., Notomista E., and de la Fuente-Nunez C. (2022). Mining for encrypted peptide antibiotics in the human proteome. Nat Biomed Eng 6, 67–75. 10.1038/s41551-021-00801-1. - DOI - PubMed
    1. Porto W.F., Irazazabal L., Alves E.S.F., Ribeiro S.M., Matos C.O., Pires Á.S., Fensterseifer I.C.M., Miranda V.J., Haney E.F., Humblot V., et al. (2018). In silico optimization of a guava antimicrobial peptide enables combinatorial exploration for peptide design. Nat Commun 9, 1490. 10.1038/s41467-018-03746-3. - DOI - PMC - PubMed
    1. Ma Y., Guo Z., Xia B., Zhang Y., Liu X., Yu Y., Tang N., Tong X., Wang M., Ye X., et al. (2022). Identification of antimicrobial peptides from the human gut microbiome using deep learning. Nat Biotechnol, 1–11. 10.1038/s41587-022-01226-0. - DOI - PubMed

Publication types