. 2023 Jun 16;18(6):1425-1434.

doi: 10.1021/acschembio.3c00159. Epub 2023 May 23.

Identification of Macrocyclic Peptide Families from Combinatorial Libraries Containing Noncanonical Amino Acids Using Cheminformatics and Bioinformatics Inspired Clustering

Man-Ling Lee¹, Sherif Farag¹, Joselyn S Del Cid², Charlene Bashore³, Kenneth K Hallenbeck², Alberto Gobbi¹, Christian N Cunningham²

Affiliations

¹ Discovery Chemistry, Genentech Inc., 1 DNA Way, South San Francisco, California 94080, United States.
² Peptide Therapeutics, Genentech Inc., 1 DNA Way, South San Francisco, California 94080, United States.
³ Biological Chemistry, Genentech Inc. 1 DNA Way, South San Francisco, California 94080, United States.

PMID: 37220419
PMCID: PMC10278063
DOI: 10.1021/acschembio.3c00159

Identification of Macrocyclic Peptide Families from Combinatorial Libraries Containing Noncanonical Amino Acids Using Cheminformatics and Bioinformatics Inspired Clustering

Man-Ling Lee et al. ACS Chem Biol. 2023.

. 2023 Jun 16;18(6):1425-1434.

doi: 10.1021/acschembio.3c00159. Epub 2023 May 23.

Authors

Man-Ling Lee¹, Sherif Farag¹, Joselyn S Del Cid², Charlene Bashore³, Kenneth K Hallenbeck², Alberto Gobbi¹, Christian N Cunningham²

Affiliations

¹ Discovery Chemistry, Genentech Inc., 1 DNA Way, South San Francisco, California 94080, United States.
² Peptide Therapeutics, Genentech Inc., 1 DNA Way, South San Francisco, California 94080, United States.
³ Biological Chemistry, Genentech Inc. 1 DNA Way, South San Francisco, California 94080, United States.

PMID: 37220419
PMCID: PMC10278063
DOI: 10.1021/acschembio.3c00159

Abstract

In the past decade, macrocyclic peptides gained increasing interest as a new therapeutic modality to tackle intracellular and extracellular therapeutic targets that had been previously classified as "undruggable". Several technological advances have made discovering macrocyclic peptides against these targets possible: 1) the inclusion of noncanonical amino acids (NCAAs) into mRNA display, 2) increased availability of next generation sequencing (NGS), and 3) improvements in rapid peptide synthesis platforms. This type of directed-evolution based screening can produce large numbers of potential hit sequences given that DNA sequencing is the functional output of this platform. The current standard for selecting hit peptides from these selections for downstream follow-up relies on the frequency counting and sorting of unique peptide sequences which can result in the generation of false negatives due to technical reasons including low translation efficiency or other experimental factors. To overcome our inability to detect weakly enriched peptide sequences among our large data sets, we wanted to develop a clustering method that would enable the identification of peptide families. Unfortunately, utilizing traditional clustering algorithms, such as ClustalW, is not possible for this technology due to the incorporation of NCAAs in these libraries. Therefore, we developed a new atomistic clustering method with a Pairwise Aligned Peptide (PAP) chemical similarity metric to perform sequence alignments and identify macrocyclic peptide families. With this method, low enriched peptides, including isolated sequences (singletons), can now be clustered into families providing a comprehensive analysis of NGS data resulting from macrocycle discovery selections. Additionally, upon identification of a hit peptide with the desired activity, this clustering algorithm can be used to identify derivatives from the initial data set for structure-activity relationship (SAR) analysis without requiring additional selection experiments.

PubMed Disclaimer

Conflict of interest statement

The authors declare the following competing financial interest(s): M-L.L, A.G., and C.N.C. are current employees of Genentech, Inc. and shareholders of Roche.

Figures

**Figure 1**
Workflow of macrocyclic peptide discovery using mRNA Display. To start, mRNA-Puromycin libraries are translated in vitro and subsequently reverse transcribed to make RNA/cDNA hybrids that are covalently attached to their cognate macrocyclic peptide. The target protein (dark red) is incubated with the peptide libraries in solution or on beads and free macrocyclic peptides are removed from the solution with a series of wash steps. The bound macrocyclic peptides are heat eluted, and the linked mRNA/cDNA is reamplified for subsequent rounds of selection until the library is enriched with peptides against the target protein. Next generation sequencing is performed on the elution fractions, and frequency counting of unique peptide sequences is performed to identify binders.

**Figure 2**
Details of the new clustering method. Lower case letters indicate a d-amino acid. (a) Short description of the algorithms. (b) Linear combination of similarity matrices encoding similarities of amino acids without and with consideration of the stereochemistry on the Cα atom. (c) Alignment of two peptide sequences and similarities of amino acid pairs; the optimal alignment requires one gap in each sequence. (d) Formula for computing the similarity of two peptides from the similarities of the paired amino acids. (e) Schematic display of the DISE clustering results where (+) denote cluster seeds and (■) the cluster members.

**Figure 3**
Result of retrospective analyses. (a) Median AUC-ROC by allele of 50 similarity search experiments for each allele as described in the method section. AUC-ROC values of 0.5 correspond to a random selection while values of 1.0 correspond to perfect enrichment. Error bars are standard deviations of 50 repeats. Numeric values and standard deviations are given in Table S2. (b–c) Result of one Mamu-B_01 similarity search using the AAP-based similarity. (b) ROC curve. (c) Activity of candidates versus similarity to the most similar query peptide. (d) Selected peptides (■ in plot c) aligned to their most similar query peptide 173.

**Figure 4**
SeqSim algorithm enables rapid identification of new chemical matter through clustering. A) Number of clusters across different similarity cutoffs. B) Target binding and cluster distribution of hit picks with a similarity cutoff of 0.35 (top panel), 0.3 (middle panel), and 0.25 (bottom panel). Graphs represent PSMD2 binding as a function of the number of sequencing reads per cluster (Sum of reads, i.e. Sum of Frequency), and hits are colored by the number of individual members for each cluster (Count, i.e. Cluster Size). ELISA data represent the mean of two independent experiments with three technical replicates each. Clusters containing the two original hits MC1 and MC3 and top clustering hits are labeled next to their data point. C) Sequence enrichment profiles of previously published hits (top panel, Original Picks) compared to profiles of hits picked through clustering analysis (bottom panel, Clustering Hit Picks). Hits are colored according to PSMD2 binding measured by ELISA. Clustering hits with equivalent or better binding (2, 3, 5, 11; Table 1) than the published macrocyclic peptide MC3 are labeled.

**Figure 5**
Correlation of enzyme inhibition and PAP similarity for reference compounds MC831 (top), MC832 (middle), and MC835 (bottom) and members of their clusters. A) Peptides more similar to the active references are likely to also inhibit the enzyme. Black dashed line indicates 0.6 similarity cutoff. Red circle highlights the parent peptide with PAP similarity of 1.0. B) Amino acid differences between MC831 (top), MC832 (middle), and MC835 (bottom) and their PAP similarity (PAPS) identified derivatives with PAPS, AlogP, and enzymatic inhibition reported for each in vitro translated peptide. ‘-’ indicates a derivative peptide with an inserted or deleted amino acid at that position relative to the parent sequence. Percent Inhibition (% Inhib.) is reported as a normalized value to a control compound.

See this image and copyright information in PMC

Cited by

New approaches for challenging therapeutic targets.
Sharma KR, Malik A, Roof RA, Boyce JP, Verma SK. Sharma KR, et al. Drug Discov Today. 2024 Apr;29(4):103942. doi: 10.1016/j.drudis.2024.103942. Epub 2024 Mar 5. Drug Discov Today. 2024. PMID: 38447929 Free PMC article. Review.
Reaching New Heights in Genetic Code Manipulation with High Throughput Screening.
Lino BR, Williams SJ, Castor ME, Van Deventer JA. Lino BR, et al. Chem Rev. 2024 Nov 13;124(21):12145-12175. doi: 10.1021/acs.chemrev.4c00329. Epub 2024 Oct 17. Chem Rev. 2024. PMID: 39418482 Review.
An mRNA Display Approach for Covalent Targeting of a Staphylococcus aureus Virulence Factor.
Wang S, Woods EC, Jo J, Zhu J, Hansel-Harris A, Holcomb M, Llanos M, Pedowitz NJ, Upadhyay T, Bennett J, Fellner M, Park KW, Zhang A, Valdez TA, Forli S, Chan AI, Cunningham CN, Bogyo M. Wang S, et al. J Am Chem Soc. 2025 Mar 12;147(10):8312-8325. doi: 10.1021/jacs.4c15713. Epub 2025 Feb 27. J Am Chem Soc. 2025. PMID: 40013487
An mRNA Display Approach for Covalent Targeting of a Staphylococcus aureus Virulence Factor.
Wang S, Woods EC, Jo J, Zhu J, Hansel-Harris A, Holcomb M, Pedowitz NJ, Upadhyay T, Bennett J, Fellner M, Park KW, Zhang A, Valdez TA, Forli S, Chan AI, Cunningham CN, Bogyo M. Wang S, et al. bioRxiv [Preprint]. 2024 Nov 8:2024.11.06.622387. doi: 10.1101/2024.11.06.622387. bioRxiv. 2024. Update in: J Am Chem Soc. 2025 Mar 12;147(10):8312-8325. doi: 10.1021/jacs.4c15713. PMID: 39574702 Free PMC article. Updated. Preprint.

References

1. Passioura T.; Katoh T.; Goto Y.; Suga H. Selection-Based Discovery of Druglike Macrocyclic Peptides. Annu. Rev. Biochem. 2014, 83 (1), 727–752. 10.1146/annurev-biochem-060713-035456. - DOI - PubMed
1. Vinogradov A. A.; Yin Y.; Suga H. Macrocyclic Peptides as Drug Candidates: Recent Progress and Remaining Challenges. J. Am. Chem. Soc. 2019, 141 (10), 4167–4181. 10.1021/jacs.8b13178. - DOI - PubMed
1. Buckton L. K.; Rahimi M. N.; McAlpine S. R. Cyclic Peptides as Drugs for Intracellular Targets: The Next Frontier in Peptide Therapeutic Development. Chem. Eur. J. 2021, 27 (5), 1487–1513. 10.1002/chem.201905385. - DOI - PubMed
1. Rezhdo A.; Islam M.; Huang M.; Van Deventer J. A Future Prospects for Noncanonical Amino Acids in Biological Therapeutics. Curr. Opin Biotech 2019, 60, 168–178. 10.1016/j.copbio.2019.02.020. - DOI - PMC - PubMed
1. Tharp J. M.; Hampton J. T.; Reed C. A.; Ehnbom A.; Chen P.-H. C.; Morse J. S.; Kurra Y.; Pérez L. M.; Xu S.; Liu W. R. An Amber Obligate Active Site-Directed Ligand Evolution Technique for Phage Display. Nat. Commun. 2020, 11 (1), 1392.10.1038/s41467-020-15057-7. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Identification of Macrocyclic Peptide Families from Combinatorial Libraries Containing Noncanonical Amino Acids Using Cheminformatics and Bioinformatics Inspired Clustering

Affiliations

Identification of Macrocyclic Peptide Families from Combinatorial Libraries Containing Noncanonical Amino Acids Using Cheminformatics and Bioinformatics Inspired Clustering

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Research Materials

Miscellaneous

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Research Materials

Miscellaneous