Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 May;2(5):392-407.
doi: 10.1038/s44222-024-00152-x. Epub 2024 Feb 26.

Machine learning for antimicrobial peptide identification and design

Affiliations

Machine learning for antimicrobial peptide identification and design

Fangping Wan et al. Nat Rev Bioeng. 2024 May.

Abstract

Artificial intelligence (AI) and machine learning (ML) models are being deployed in many domains of society and have recently reached the field of drug discovery. Given the increasing prevalence of antimicrobial resistance, as well as the challenges intrinsic to antibiotic development, there is an urgent need to accelerate the design of new antimicrobial therapies. Antimicrobial peptides (AMPs) are therapeutic agents for treating bacterial infections, but their translation into the clinic has been slow owing to toxicity, poor stability, limited cellular penetration and high cost, among other issues. Recent advances in AI and ML have led to breakthroughs in our abilities to predict biomolecular properties and structures and to generate new molecules. The ML-based modelling of peptides may overcome some of the disadvantages associated with traditional drug discovery and aid the rapid development and translation of AMPs. Here, we provide an introduction to this emerging field and survey ML approaches that can be used to address issues currently hindering AMP development. We also outline important limitations that can be addressed for the broader adoption of AMPs in clinical practice, as well as new opportunities in data-driven peptide design.

PubMed Disclaimer

Conflict of interest statement

Competing interests J.J.C. is scientific co-founder and scientific advisory board chair of EnBiotix, an antibiotic drug discovery company, and Phare Bio, a non-profit venture focused on antibiotic drug development. C.d.l.F.-N. provides consulting services to Invaio Sciences and is a member of the Scientific Advisory Boards of Nowture S.L. and Phare Bio. The remaining authors declare no competing interests.

Figures

Fig. 1 |
Fig. 1 |. Timelines of major machine learning/artificial intelligence (ML/AI) events and recent studies of ML/AI-driven antimicrobial peptide (AMP) identification and design.
Various ML/AI-driven approaches have been developed to discover AMP-like sequences from available genomic or proteomic data and to design synthetic AMPs. Here, we highlight several studies in which the predictions of ML/AI-driven models were validated in vitro or in mouse models of bacterial infection. Details of highlighted AMP studies can be found in Table 1. DL, deep learning; GAN, generative adversarial network; MD, molecular dynamics; NN, neural network; VAE, variational autoencoder.
Fig. 2 |
Fig. 2 |. Methods of representing peptides as inputs to machine learning models.
a, For any peptide, global descriptors use fixed-size vectors to encode peptide information such as sequence composition, structural features and physicochemical properties. b, A sequence-based representation encodes a peptide using data from its primary sequence of amino acids. Each type of amino acid is associated with a fixed-size vector encoding the corresponding residue information (such as amino acid type and physicochemical properties). These vectors, or ‘embeddings’, can also be learned from data. c, A graph-based representation consists of nodes and edges. To represent peptides, the nodes can be atoms or residues, whereas the edges can be bonds or geometric distance (given a 3D structure of the peptide) between nodes. Nodes and edges are associated with corresponding vectors representing atom, bond and geometric information. d, When the 3D structure of a peptide is available, the peptide can also be represented by a voxelized (or discretized) form of the structure. Each voxel is represented by a vector, which stores information regarding atom occupancies and atom properties relevant to that voxel. e, Machine learning (ML) models can be used to extract low-dimensional features from peptide inputs from sequence or structure. The extracted features can be used as inputs for other peptide-related tasks.
Fig. 3 |
Fig. 3 |. Schematic illustration of deep generative models for antimicrobial peptides (AMPs).
a, In neural language models, sections of the input (such as certain letters in an input sequence) are missing and the model (often a deep neural network, DNN) is asked to reconstruct the missing parts from the incomplete input. After training, partial inputs are fed into the model to generate new peptides. b, A variational autoencoder (VAE) consists of an encoder and a decoder neural network. The encoder maps the input to a low-dimensional embedding, Z (a vector), which follows some distribution. The decoder then processes the embedding and reconstructs the original input. As the embeddings fall into some probability distribution, new peptides can be generated by decoding embeddings that are sampled from the distribution. c, Normalizing flow models are similar to VAEs, with the exception that the encoder neural network is specified to be dimension-preserving and invertible (hence, the corresponding decoder is the inverse function of the encoder). This makes normalizing flow models capable of inferring exact likelihoods of data. d, A generative adversarial network (GAN) consists of a generator that creates a synthetic peptide from a random vector and a discriminator that aims to identify whether the generated peptide comes from actual data or is synthetic. Because the discriminator and the generator compete with each other, the synthetic peptides produced by the generator will converge to approximate the peptides found in actual data. e, Given any input (X0), diffusion models gradually add Gaussian noise to the input. Sufficient noise addition transforms the input to random Gaussian noise, XT. A DNN is then trained using data to reverse-transform the random noise, XT, back to the original input (from XT to X0). This reverse step specifies the process of how new peptides are generated from random noise.

References

    1. Fjell CD, Hiss JA, Hancock REW & Schneider G Designing antimicrobial peptides: form follows function. Nat. Rev. Drug. Discov 11, 37–51 (2012). - PubMed
    1. Yan J et al. Recent progress in the discovery and design of antimicrobial peptides using traditional machine learning and deep learning. Antibiotics 11, 1451 (2022). - PMC - PubMed
    1. Silva ON et al. Repurposing a peptide toxin from wasp venom into antiinfectives with dual antimicrobial and immunomodulatory properties. PNAS 117, 26936–26945 (2020). - PMC - PubMed
    1. Magana M et al. The value of antimicrobial peptides in the age of resistance. Lancet Infect. Dis 20, e216–e230 (2020). - PubMed
    1. Bahar A & Ren D Antimicrobial peptides. Pharmaceuticals 6, 1543–1575 (2013). - PMC - PubMed

LinkOut - more resources