Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jun 7;12(26):9221-9232.
doi: 10.1039/d1sc01713f. eCollection 2021 Jul 7.

Machine learning designs non-hemolytic antimicrobial peptides

Affiliations

Machine learning designs non-hemolytic antimicrobial peptides

Alice Capecchi et al. Chem Sci. .

Abstract

Machine learning (ML) consists of the recognition of patterns from training data and offers the opportunity to exploit large structure-activity databases for drug design. In the area of peptide drugs, ML is mostly being tested to design antimicrobial peptides (AMPs), a class of biomolecules potentially useful to fight multidrug-resistant bacteria. ML models have successfully identified membrane disruptive amphiphilic AMPs, however mostly without addressing the associated toxicity to human red blood cells. Here we trained recurrent neural networks (RNN) with data from DBAASP (Database of Antimicrobial Activity and Structure of Peptides) to design short non-hemolytic AMPs. Synthesis and testing of 28 generated peptides, each at least 5 mutations away from training data, allowed us to identify eight new non-hemolytic AMPs against Pseudomonas aeruginosa, Acinetobacter baumannii, and methicillin-resistant Staphylococcus aureus (MRSA). These results show that machine learning (ML) can be used to design new non-hemolytic AMPs.

PubMed Disclaimer

Conflict of interest statement

There are no conflicts to declare.

Figures

Fig. 1
Fig. 1. (a) Strategy schematic. An AMP RNN generative model, an AMP RNN activity classifier, and a hemolysis RNN classifier were trained using activity (orange) and hemolysis (blue) data from DBAASP. (1) Two copies of the AMP RNN generative model (prior model) were transferred learned using active and non-hemolytic peptides against specific strains: P. aeruginosa/A. baumannii and S. aureus, respectively. (2) The fine-tuned models were sampled, and the generated sequences were first classified using the RNN AMP activity classifier and then the RNN hemolysis classifier. (3) The selected sequences were further filtered to obtain short peptides of maximum 15 residues with at least five mutations from the sequences in DBAASP and no d amino acids. Then two different selection strategies were used. In the first selection strategy (1st strategy) we used the calculated amphiphilicity of the sequences to further filter them, and we clustered the selected ones. In the second selection strategy (2nd strategy) we select at random 10 sequences. (4) Finally, the 28 chosen sequences were synthesized and tested. (b) ROC curves of the test set for the NB, RF, SVM, RNN, and RNN with scrambled labels (RNN scr.) models for the AMP activity (b) and hemolysis (c) classification tasks. The probabilistic prediction values were converted into binary classification values using a threshold of 0.5.
Fig. 2
Fig. 2. (a) CD spectra of GN1, GN2, and GP1 recorded at 0.100 mg mL−1 in 10 mM phosphate buffer pH 7.4 with or without 5 mM DPC. (b) Extraction of percentages of secondary structure from primary CD data using DichroWeb. The Contin-LL method and reference set 4 were used. (c) Helix properties predicted by HeliQuest. Circle size proportional to side-chain size, blue indicates cationic residues, yellow indicates hydrophobic residues, grey indicates alanine, green indicates proline, purple indicates serine. The arrows inside each helix wheel indicates the magnitude and direction of the hydrophobic moment.
Fig. 3
Fig. 3. MD simulations of GN1 in water and in presence of a DPC micelle over 250 ns using GROMACS. (a) Average structure (stick model) in water over 100 structures sampled over the last 100 ns (thin lines). Hydrophobic side chains are colored in red and cationic side chains in blue. (b) Average structure (cartoon model for backbone and stick model for side chains) with DPC micelle over 100 structures sampled over the last 100 ns (thin lines). (c) RMSD (root mean square deviation) of the peptide backbone atoms relative to the starting α-helical conformation. (d) Number of intramolecular hydrogen bonds. The DPC micelle was omitted for clarity.
Fig. 4
Fig. 4. TEM images of P. aeruginosa and A. baumannii, after 2 hours treatment of GN1 in MH medium. Blue arrows indicate effects on the bacteria.

References

    1. Sliwoski G. Kothiwale S. Meiler J. Lowe E. W. Pharmacol. Rev. 2014;66:334–395. - PMC - PubMed
    1. Vamathevan J. Clark D. Czodrowski P. Dunham I. Ferran E. Lee G. Li B. Madabhushi A. Shah P. Spitzer M. Zhao S. Nat. Rev. Drug Discovery. 2019;18:463–477. - PMC - PubMed
    1. Lo Y.-C. Rensi S. E. Torng W. Altman R. B. Drug Discovery Today. 2018;23:1538–1546. - PMC - PubMed
    1. Chen H. Engkvist O. Wang Y. Olivecrona M. Blaschke T. Drug Discovery Today. 2018;23:1241–1250. - PubMed
    1. Schneider G. Nat. Rev. Drug Discovery. 2018;17:97–113. - PubMed