Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jun 16;18(6):1425-1434.
doi: 10.1021/acschembio.3c00159. Epub 2023 May 23.

Identification of Macrocyclic Peptide Families from Combinatorial Libraries Containing Noncanonical Amino Acids Using Cheminformatics and Bioinformatics Inspired Clustering

Affiliations

Identification of Macrocyclic Peptide Families from Combinatorial Libraries Containing Noncanonical Amino Acids Using Cheminformatics and Bioinformatics Inspired Clustering

Man-Ling Lee et al. ACS Chem Biol. .

Abstract

In the past decade, macrocyclic peptides gained increasing interest as a new therapeutic modality to tackle intracellular and extracellular therapeutic targets that had been previously classified as "undruggable". Several technological advances have made discovering macrocyclic peptides against these targets possible: 1) the inclusion of noncanonical amino acids (NCAAs) into mRNA display, 2) increased availability of next generation sequencing (NGS), and 3) improvements in rapid peptide synthesis platforms. This type of directed-evolution based screening can produce large numbers of potential hit sequences given that DNA sequencing is the functional output of this platform. The current standard for selecting hit peptides from these selections for downstream follow-up relies on the frequency counting and sorting of unique peptide sequences which can result in the generation of false negatives due to technical reasons including low translation efficiency or other experimental factors. To overcome our inability to detect weakly enriched peptide sequences among our large data sets, we wanted to develop a clustering method that would enable the identification of peptide families. Unfortunately, utilizing traditional clustering algorithms, such as ClustalW, is not possible for this technology due to the incorporation of NCAAs in these libraries. Therefore, we developed a new atomistic clustering method with a Pairwise Aligned Peptide (PAP) chemical similarity metric to perform sequence alignments and identify macrocyclic peptide families. With this method, low enriched peptides, including isolated sequences (singletons), can now be clustered into families providing a comprehensive analysis of NGS data resulting from macrocycle discovery selections. Additionally, upon identification of a hit peptide with the desired activity, this clustering algorithm can be used to identify derivatives from the initial data set for structure-activity relationship (SAR) analysis without requiring additional selection experiments.

PubMed Disclaimer

Conflict of interest statement

The authors declare the following competing financial interest(s): M-L.L, A.G., and C.N.C. are current employees of Genentech, Inc. and shareholders of Roche.

Figures

Figure 1
Figure 1
Workflow of macrocyclic peptide discovery using mRNA Display. To start, mRNA-Puromycin libraries are translated in vitro and subsequently reverse transcribed to make RNA/cDNA hybrids that are covalently attached to their cognate macrocyclic peptide. The target protein (dark red) is incubated with the peptide libraries in solution or on beads and free macrocyclic peptides are removed from the solution with a series of wash steps. The bound macrocyclic peptides are heat eluted, and the linked mRNA/cDNA is reamplified for subsequent rounds of selection until the library is enriched with peptides against the target protein. Next generation sequencing is performed on the elution fractions, and frequency counting of unique peptide sequences is performed to identify binders.
Figure 2
Figure 2
Details of the new clustering method. Lower case letters indicate a d-amino acid. (a) Short description of the algorithms. (b) Linear combination of similarity matrices encoding similarities of amino acids without and with consideration of the stereochemistry on the Cα atom. (c) Alignment of two peptide sequences and similarities of amino acid pairs; the optimal alignment requires one gap in each sequence. (d) Formula for computing the similarity of two peptides from the similarities of the paired amino acids. (e) Schematic display of the DISE clustering results where (+) denote cluster seeds and (■) the cluster members.
Figure 3
Figure 3
Result of retrospective analyses. (a) Median AUC-ROC by allele of 50 similarity search experiments for each allele as described in the method section. AUC-ROC values of 0.5 correspond to a random selection while values of 1.0 correspond to perfect enrichment. Error bars are standard deviations of 50 repeats. Numeric values and standard deviations are given in Table S2. (b–c) Result of one Mamu-B_01 similarity search using the AAP-based similarity. (b) ROC curve. (c) Activity of candidates versus similarity to the most similar query peptide. (d) Selected peptides (■ in plot c) aligned to their most similar query peptide 173.
Figure 4
Figure 4
SeqSim algorithm enables rapid identification of new chemical matter through clustering. A) Number of clusters across different similarity cutoffs. B) Target binding and cluster distribution of hit picks with a similarity cutoff of 0.35 (top panel), 0.3 (middle panel), and 0.25 (bottom panel). Graphs represent PSMD2 binding as a function of the number of sequencing reads per cluster (Sum of reads, i.e. Sum of Frequency), and hits are colored by the number of individual members for each cluster (Count, i.e. Cluster Size). ELISA data represent the mean of two independent experiments with three technical replicates each. Clusters containing the two original hits MC1 and MC3 and top clustering hits are labeled next to their data point. C) Sequence enrichment profiles of previously published hits (top panel, Original Picks) compared to profiles of hits picked through clustering analysis (bottom panel, Clustering Hit Picks). Hits are colored according to PSMD2 binding measured by ELISA. Clustering hits with equivalent or better binding (2, 3, 5, 11; Table 1) than the published macrocyclic peptide MC3 are labeled.
Figure 5
Figure 5
Correlation of enzyme inhibition and PAP similarity for reference compounds MC831 (top), MC832 (middle), and MC835 (bottom) and members of their clusters. A) Peptides more similar to the active references are likely to also inhibit the enzyme. Black dashed line indicates 0.6 similarity cutoff. Red circle highlights the parent peptide with PAP similarity of 1.0. B) Amino acid differences between MC831 (top), MC832 (middle), and MC835 (bottom) and their PAP similarity (PAPS) identified derivatives with PAPS, AlogP, and enzymatic inhibition reported for each in vitro translated peptide. ‘-’ indicates a derivative peptide with an inserted or deleted amino acid at that position relative to the parent sequence. Percent Inhibition (% Inhib.) is reported as a normalized value to a control compound.

Similar articles

Cited by

References

    1. Passioura T.; Katoh T.; Goto Y.; Suga H. Selection-Based Discovery of Druglike Macrocyclic Peptides. Annu. Rev. Biochem. 2014, 83 (1), 727–752. 10.1146/annurev-biochem-060713-035456. - DOI - PubMed
    1. Vinogradov A. A.; Yin Y.; Suga H. Macrocyclic Peptides as Drug Candidates: Recent Progress and Remaining Challenges. J. Am. Chem. Soc. 2019, 141 (10), 4167–4181. 10.1021/jacs.8b13178. - DOI - PubMed
    1. Buckton L. K.; Rahimi M. N.; McAlpine S. R. Cyclic Peptides as Drugs for Intracellular Targets: The Next Frontier in Peptide Therapeutic Development. Chem. Eur. J. 2021, 27 (5), 1487–1513. 10.1002/chem.201905385. - DOI - PubMed
    1. Rezhdo A.; Islam M.; Huang M.; Van Deventer J. A Future Prospects for Noncanonical Amino Acids in Biological Therapeutics. Curr. Opin Biotech 2019, 60, 168–178. 10.1016/j.copbio.2019.02.020. - DOI - PMC - PubMed
    1. Tharp J. M.; Hampton J. T.; Reed C. A.; Ehnbom A.; Chen P.-H. C.; Morse J. S.; Kurra Y.; Pérez L. M.; Xu S.; Liu W. R. An Amber Obligate Active Site-Directed Ligand Evolution Technique for Phage Display. Nat. Commun. 2020, 11 (1), 1392.10.1038/s41467-020-15057-7. - DOI - PMC - PubMed