Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Mar 15;14(1):1453.
doi: 10.1038/s41467-023-36994-z.

Discovering highly potent antimicrobial peptides with deep generative model HydrAMP

Affiliations

Discovering highly potent antimicrobial peptides with deep generative model HydrAMP

Paulina Szymczak et al. Nat Commun. .

Erratum in

Abstract

Antimicrobial peptides emerge as compounds that can alleviate the global health hazard of antimicrobial resistance, prompting a need for novel computational approaches to peptide generation. Here, we propose HydrAMP, a conditional variational autoencoder that learns lower-dimensional, continuous representation of peptides and captures their antimicrobial properties. The model disentangles the learnt representation of a peptide from its antimicrobial conditions and leverages parameter-controlled creativity. HydrAMP is the first model that is directly optimized for diverse tasks, including unconstrained and analogue generation and outperforms other approaches in these tasks. An additional preselection procedure based on ranking of generated peptides and molecular dynamics simulations increases experimental validation rate. Wet-lab experiments on five bacterial strains confirm high activity of nine peptides generated as analogues of clinically relevant prototypes, as well as six analogues of an inactive peptide. HydrAMP enables generation of diverse and potent peptides, making a step towards resolving the antimicrobial resistance crisis.

PubMed Disclaimer

Conflict of interest statement

The authors P.Sz., M.Mo., T.G., R.J., M.B., D.N., M.Mi, P.Se, W.K, and E.S have issued a patent for the newly generated AMPs under the patent number P.443243. The other authors declare no competing interests.

Figures

Fig. 1
Fig. 1. HydrAMP architecture and data traversion overview.
a Compositional structure of the training data set. b Data flow and optimization setup during the training. Colors indicate training modes: reconstruction (pink), unconstrained (yellow), analogue (blue), and all modes together (grey). The arrows show the path each peptide traverses within a given training mode. Lines connect arguments of different terms of the objective (loss) function, as indicated by line styling: latent reconstruction regularization, Jacobian disentanglement regularization, KL divergence, and cross entropy. Shaded areas indicate components with frozen weights. c Data flow during the generation of peptides. Colors indicate generation: analogue (blue), and unconstrained (yellow). The model is validated using molecular dynamics simulations and wet-lab validation (activity and toxicity assays). HydrAMP functionality is available via a web service https://hydramp.mimuw.edu.pl/.
Fig. 2
Fig. 2. Analogue generation performance in terms of number of generated analogues of HydrAMP (red), in comparison to PepCVAE (dark blue), Basic (light blue), and Joker (dark green).
a Fraction (y-axis) and number (over each bar) of 1319 positive (AMP and highly active) peptides from the test set, which produced analogues that met baseline discovery or improvement discovery criteria (x-axis) in the analogue generation. b As in a, but for 1253 negative peptides from the test set. c, d Left: The relation between the creativity parameter temperature (τ; x-axis) and the log number of generated unique analogues that met the baseline discovery criteria, out of 10,000 total attempts (y-axis; the actual number of analogues shown above each bar). Right: the distribution of the Levenshtein distances between generated unique analogues out of 10,000 total attempts and the prototype sequence; for Pexiganan (c) and CAMEL (d). The borders of the box indicate the first quartile (bottom) and the third quartile (top) of the data. The line within the box indicates the median. The whiskers indicate the most extreme, non-outlier data points, whereas the dots behind the whiskers denote the outliers. The sample sizes for each box in (c) are n = 13 for τ = 1, n = 206 for τ = 2, and n = 8486 for τ = 5. The sample sizes for each box in (d) are n = 2 for τ = 1, n = 173 for τ = 2, and n = 6743 for τ = 5. Source data are provided as a Source Data file.
Fig. 3
Fig. 3. Analogue generation performance in terms of PMAMP and PMMIC for HydrAMP (red), in comparison to PepCVAE (dark blue), Basic (light blue), and Joker (dark green) compared to the distribution of test data (yellow for PMAMP and violet for PMMIC).
Test data contains all the prototypes and serves as a reference. a, b, c, d The probability distributions of PMAMP (a, c) and PMMIC (b, d) for n = 1253 negative prototypes and the generated analogues that met the baseline (a, b) or improvement (c, d) discovery criteria. e, f, g, h The probability distributions of PMAMP (e, g) and PMMIC (f, h) for n = 1319 positive prototypes and the generated analogues that met the baseline (e, f) or improvement (g, h) discovery criteria. The white dots mark the median of each distribution. The black vertical lines denote the interquartile range of each distribution. Source data are provided as a Source Data file.
Fig. 4
Fig. 4. Unconstrained generative performance of HydrAMP (red), in comparison to PepCVAE (dark blue), Basic (light blue), AMP-LM (green), Dean-VAE (pink), Muller-LSTM (aquamarine), and AMP-GAN (orange) (methods indicated on the x-axis).
a, b The distribution of probabilities of being antimicrobial (PMAMP), for the task of generating positive peptides (a) and for the task of generating negative peptides (b). c, d The distribution of probabilities of being highly active (having low MIC, PMMIC) for the task of generating positive peptides (c) and for the task of generating negative peptides (d). In panels a, b, c, d, the white dots mark the median of each distribution. The black vertical lines denote the interquartile range of each distribution. Sample sizes in panels a, c: HydrAMP n = 50, 000, PepCVAE n = 50, 000, Basic n = 50, 000, AMP-LM n = 24, 588, Dean-VAE n = 2973, Muller-LSTM n = 976, AMP-GAN n = 50, 000. Sample sizes in panels b, d: HydrAMP, PepCVAE and Basic n = 50, 000. e For the task of generating positive peptides, fraction of generated peptides with PMAMP>0.8 (first bar plot), fraction of peptides with PMMIC>0.5 (second bar plot), fraction of peptides that satisfy both previous criteria i.e. classified as positive (third bar plot). The number over each bar: the actual number of peptides that met the condition. Source data are provided as a Source Data file.
Fig. 5
Fig. 5. Physicochemical properties of analogues generated by HydrAMP and compared methods in analogue generation for non-AMP or AMP prototypes in comparison with real and random data.
Distributions of properties (a, b, c Isoelectric point, d, e, f Charge, g, h, i Hydrophobic ratio, j, k, l Aromaticity) of randomly generated peptides (dark gray; sample size n = 1253;), peptides sampled from UniProt (light gray; n = 1253), true negatives (green; n = 1253), and true positives (yellow; n = 1319), in comparison with AMP analogues generated from negatives (b, e, h, k), and positives (c, f, i, l) by different models: HydrAMP with various creativity parameter temperature values: τ = 1 (light red), τ = 2 (red), τ = 5 (dark red), PepCVAE (dark blue), Basic (light blue), and Joker (dark green). HydrAMP generated analogues for n = 1253 negatives (for all temperature values), Joker generated analogues for n = 556 of them. For the positive test set, HydrAMP generated analogues for all n = 1319 peptides (for all temperature values), while Joker generated analogues for n = 605 of them. The significance levels of one-sided Mann-Whitney test are denoted above the boxes as: ns - P ≥ 0.05; * - P ≤ 0.05; ** - P ≤ 0.01; *** - P ≤ 0.001. The borders of the boxes indicate first quartile (bottom) and the third quartile (top) of the data. The line within the box indicates the median. The whiskers indicate the the most extreme, nonoutlier data points. Source data are provided as a Source Data file.
Fig. 6
Fig. 6. Summary of atomistic molecular dynamics simulations of peptide-membrane systems.
a, b Late simulation snapshots of experimentally verified AMP Hydraganan-1 (red) and non-AMP peptide Pex-P1-4 (beige), respectively, in the membrane (blue); top view on membrane surface; water molecules not depicted for clarity; c scheme of optimized simulation protocol used for AMP preselection; t -- simulation time; d0 initial peptide-membrane separation; d scheme illustrating the evaluation of the S parameter that describes the level of peptide burial within the membrane; z -- membrane normal axis; e system descriptors in Hydraganan-1 and non-AMP Pex-P1-4 simulations; upper plot: an average (lines) and standard deviation (shaded areas) of α-helix fraction within peptide residues; lower plot: the evolution of S parameter for each of three initial peptide placements; f time evolution of S parameter for three sample peptides illustrating possible routes in the preselection algorithm. The time evolution suggests that Stigmurin and Olabogan are active AMPs, while GQ20 is inactive. Source data are provided as a Source Data file.
Fig. 7
Fig. 7. AMP success rates depending on activity threshold.
AMP success rate (y-axis) for different MIC thresholds (MIC measured in units [μg / mL]; x-axis), for different methods: HydrAMP (red), CLaSS (beige), Joker (dark green). Source data are provided as a Source Data file.
Fig. 8
Fig. 8. Peptide structures according to MD simulations and AlphaFold2 predictions.
a Left to right: representative structures for the last 100 ns of MD runs, for peptides with average helical content in [0.75, 1], [0.5, 0.75), (0,0.5) range, respectively (green), with superimposed AlphaFold2 predicted geometries (orange); bS distributions in corresponding peptide groups. Source data are provided as a Source Data file.

References

    1. CDC. Antibiotic Resistance Threats in the United States, 2019. Atlanta, GA, USA: US Department of Health and Human Services, CDC (2019).
    1. O’Neill, J. Tackling drug-resistant infections globally: final report and recommendations. The Review on Antimicrobial Resistance, Government of the United Kingdom (2016).
    1. Magana M. The value of antimicrobial peptides in the age of resistance. Lancet Infect. Dis. 2020;20:e216–e230. - PubMed
    1. Czaplewski L, et al. Alternatives to antibiotics-a pipeline portfolio review. Lancet Infect. Dis. 2016;16:239–251. - PubMed
    1. Jenssen H, Hamill P, Hancock RE. Peptide Antimicrobial Agents. Clin. Microbiol. Rev. 2006;19:491–511. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources