Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Apr 3;38(1):17.
doi: 10.1007/s10822-024-00558-0.

Computational peptide discovery with a genetic programming approach

Affiliations

Computational peptide discovery with a genetic programming approach

Nicolas Scalzitti et al. J Comput Aided Mol Des. .

Abstract

The development of peptides for therapeutic targets or biomarkers for disease diagnosis is a challenging task in protein engineering. Current approaches are tedious, often time-consuming and require complex laboratory data due to the vast search spaces that need to be considered. In silico methods can accelerate research and substantially reduce costs. Evolutionary algorithms are a promising approach for exploring large search spaces and can facilitate the discovery of new peptides. This study presents the development and use of a new variant of the genetic-programming-based POET algorithm, called POET Regex , where individuals are represented by a list of regular expressions. This algorithm was trained on a small curated dataset and employed to generate new peptides improving the sensitivity of peptides in magnetic resonance imaging with chemical exchange saturation transfer (CEST). The resulting model achieves a performance gain of 20% over the initial POET models and is able to predict a candidate peptide with a 58% performance increase compared to the gold-standard peptide. By combining the power of genetic programming with the flexibility of regular expressions, new peptide targets were identified that improve the sensitivity of detection by CEST. This approach provides a promising research direction for the efficient identification of peptides with therapeutic or diagnostic potential.

Keywords: CEST MRI; Contrast agent; Evolutionary algorithm; Genetic programming; Peptide discovery; Regular expressions.

PubMed Disclaimer

Conflict of interest statement

The authors declare no Conflict of interest.

Figures

Fig. 1
Fig. 1
Classical evolutionary cycle of a GP algorithm
Fig. 2
Fig. 2
a Representation of an individual (a protein-function model) as a list of rules with 3 columns (ID, regular expression pattern and weight). An example (RE3) is represented as a built-in list structure in Python, where a parent node i has 2 children: (i*2)+1 and (i*2)+2. b Representation of RE3 as a binary tree. The yellow node is the root, grey nodes are the internal nodes and green nodes are the leaves. The small dotted nodes with red numbers are unexpressed nodes represented by “None”
Fig. 3
Fig. 3
Representation of the one-point crossover: a part of parent 1 is merged with a part of parent 2 to produce offspring
Fig. 4
Fig. 4
Representation of each type of mutation. a Addition of a new rule in the list of rules. b Replacement of a rule by a new rule. c Deletion of an existing rule in the list of rules. d Replacement of a branch of the tree. e Exchange of a node. f Deletion of a subtree. g Add one or more AAs to a leaf
Fig. 5
Fig. 5
Average pairwise sequence identity in the dataset in percent, with [i–j] indicating values from i (included) to j (excluded)
Fig. 6
Fig. 6
a Frequency of occurrence of each AA in both training (blue) and test (orange) sets. Molecules are illustrated for the four most prevalent AAs in the training set, and hydroxyl or amine groups are highlighted. b Comparison of the frequency of each AA in our dataset (yellow) and in the UniProtKB/Swiss-Prot database (green). The different values represent the percentage of occurrence. c Potential CEST value associated with each AA by occurrence method. The green box represents positively charged AAs, and the red box represents negatively charged AAs. d) Frequency of the 20 most observed motifs (size 2 to 6) in the training set with the associated CEST value
Fig. 7
Fig. 7
a Comparison of POETRegex (blue) and POETRdm (purple) models on the test set. b Performance of the best POETRdm model on the training set (orange) and the test set (green). The translucent bands around the regression line represent the confidence interval for the regression estimate
Fig. 8
Fig. 8
a Performance of the best POETRegex model on the training set (orange) and on the test set (green). The strong correlation indicates that the algorithm has converged to a good solution. The translucent bands around the regression line represent the confidence interval for the regression estimate. b Evolution of the fitness value during the evolutionary process. The green curve represents the fitness value of the best individual, and the orange curve represents the fitness value of the entire population
Fig. 9
Fig. 9
The 9 best POET models. Each dot represents a datapoint with a true CEST value associated with a predicted CEST value. The green line represents the regression line and the translucent bands around the regression line represent the confidence interval for the regression estimate
Fig. 10
Fig. 10
a Number of AAs present in the predicted peptides in the 3 types of DE experiments: 1000 (blue), 100 (orange) and 10 (green) cycles. b Sequence logos highlighting the probability of each AA at a given position, for the 3 experiments. As the number of cycles increases, the predicted peptides are more similar with high rates of lysine and leucine. The polar AAs are in green, the neutral in purple, the positively charged in blue, the negatively charged in red and the hydrophobic in black
Fig. 11
Fig. 11
MTRasym plot of nine peptides and the gold standard peptide (K12) measured by NMR

Update of

References

    1. Wilcox G (2005) Insulin and insulin resistance. Clin Biochem Rev 26:19 - PMC - PubMed
    1. Hökfelt T et al (2000) Neuropeptides: an overview. Neuropharmacology 39:1337–1356 - PubMed
    1. Zhang L-J, Gallo RL (2016) Antimicrobial peptides. Curr Biol 26:14–19 - PubMed
    1. Calvete JJ, Sanz L, Angulo Y, Lomonte B, Gutiérrez JM (2009) Venoms, venomics, antivenomics. FEBS Lett 583:1736–1743 - PubMed
    1. King GF (2011) Venoms as a platform for human drugs: translating toxins into therapeutics. Expert Opin Biol Ther. 11:1469–1484 - PubMed

LinkOut - more resources