Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Dec 22;18(12):e3001026.
doi: 10.1371/journal.pbio.3001026. eCollection 2020 Dec.

Expansion of RiPP biosynthetic space through integration of pan-genomics and machine learning uncovers a novel class of lanthipeptides

Affiliations

Expansion of RiPP biosynthetic space through integration of pan-genomics and machine learning uncovers a novel class of lanthipeptides

Alexander M Kloosterman et al. PLoS Biol. .

Abstract

Microbial natural products constitute a wide variety of chemical compounds, many which can have antibiotic, antiviral, or anticancer properties that make them interesting for clinical purposes. Natural product classes include polyketides (PKs), nonribosomal peptides (NRPs), and ribosomally synthesized and post-translationally modified peptides (RiPPs). While variants of biosynthetic gene clusters (BGCs) for known classes of natural products are easy to identify in genome sequences, BGCs for new compound classes escape attention. In particular, evidence is accumulating that for RiPPs, subclasses known thus far may only represent the tip of an iceberg. Here, we present decRiPPter (Data-driven Exploratory Class-independent RiPP TrackER), a RiPP genome mining algorithm aimed at the discovery of novel RiPP classes. DecRiPPter combines a Support Vector Machine (SVM) that identifies candidate RiPP precursors with pan-genomic analyses to identify which of these are encoded within operon-like structures that are part of the accessory genome of a genus. Subsequently, it prioritizes such regions based on the presence of new enzymology and based on patterns of gene cluster and precursor peptide conservation across species. We then applied decRiPPter to mine 1,295 Streptomyces genomes, which led to the identification of 42 new candidate RiPP families that could not be found by existing programs. One of these was studied further and elucidated as a representative of a novel subfamily of lanthipeptides, which we designate class V. The 2D structure of the new RiPP, which we name pristinin A3 (1), was solved using nuclear magnetic resonance (NMR), tandem mass spectrometry (MS/MS) data, and chemical labeling. Two previously unidentified modifying enzymes are proposed to create the hallmark lanthionine bridges. Taken together, our work highlights how novel natural product families can be discovered by methods going beyond sequence similarity searches to integrate multiple pathway discovery criteria.

PubMed Disclaimer

Conflict of interest statement

I have read the journal's policy and the authors of this manuscript have the following competing interests: P.C. is currently an employee of Verily Life Sciences. M.H. is currently an employee of LifeMine Therapeutics. M.S.D. is a member of the Scientific Advisory Board of DeepBiome Therapeutics. M.A.F. is a cofounder and director of Federation Bio. M.H.M. is on the scientific advisory board of Hexagon Bio and co-founder of Design Pharmaceuticals.

Figures

Fig 1
Fig 1. decRiPPter pipeline for the detection of novel RiPP families.
The SVM classifiers is used to identify all candidate RiPP precursors in a given group of genomes, using all predicted proteins smaller than 100 amino acids. The gene clusters formed around the precursors are analyzed for specific protein domains. In addition, all COG scores are calculated to act as an additional filter and to aid in gene cluster detection. The remaining gene clusters are clustered together and with MIBiG gene clusters to dereplicate and organize the results. In addition, overlap with antiSMASH detected BGCs is analyzed. BGC, biosynthetic gene cluster; COG, cluster of orthologous genes; decRiPPter, Data-driven Exploratory Class-independent RiPP TrackER; MIBiG, Minimum Information about a Biosynthetic Gene cluster; RiPP, ribosomally synthesized and post-translationally modified peptide; SVM, Support Vector Machine.
Fig 2
Fig 2. decRiPPter finds 42 candidate RiPP families with a large variety of encoded modifying enzymes and precursors.
Gene clusters found in 1,295 Streptomyces genomes were passed through a strict filter and grouped together. Arrow colors indicate enzyme family of the product, and the description of the putative gene products is given below the arrows. Roughly a third of the remaining candidates overlapped with or were similar to RiPP BGCs predicted by antiSMASH. Another third of the remaining candidates were discarded as likely false positives. Of the remaining 42 candidate RiPP families, 15 example gene clusters are displayed. BGC, biosynthetic gene cluster; decRiPPter, Data-driven Exploratory Class-independent RiPP TrackER; RiPP, ribosomally synthesized and post-translationally modified peptide.
Fig 3
Fig 3. The pristinin BGC (spr) of S. pristinaespiralis produces a highly modified RiPP.
(A) The spr gene cluster encodes 3 putative RiPP precursors, 3 transporters, a peptidase, and an assortment of modifying enzymes (see Table 2). Alignment of the predicted precursor peptides is given below. (B) Protein abundance of the products of the spr gene cluster in S. pristinaespiralis ATCC 25468 and its derivatives. Strains were grown in NMMP and samples were taken after 7 days. Enhanced expression of the regulator (from construct pAK1) resulted in the partial activation of the gene cluster. Proteins that could not be detected are not illustrated. (C) Overlay chromatogram of crude extracts from strains grown under the same conditions as under (B), samples after 7 days. Several peaks were detected in the extract from the strain with expression construct pAK1 between 7 and 8 minutes. (D) Boxplot of 2 peaks detected only in the strain with pAK1. The 2 masses could be related to 2 of the 3 precursors peptides. (E) 2D structure of pristinin A3 (1), derived from the SprA3 precursor. The compound has a mass of 2,703.235 Da. Numerical data of B, C, and D is available in S3 Data. BGC, biosynthetic gene cluster; NMMP, NH4-based Minimal Medium with Phosphate; RiPP, ribosomally synthesized and post-translationally modified peptide.
Fig 4
Fig 4. Orthologs of sprPT and sprH3 co-occur in a wide variety of genetic contexts.
(Left side) Phylogenetic tree of gene clusters containing homologs of sprPT and sprH3, visualized by CORASON. A red dot indicates that the genes were present in a gene cluster found by decRiPPter, a yellow dot that it passed the strict filter (Table 1). A blue dot indicates overlap with a BGC identified by antiSMASH. (Right side) Several gene clusters with varying genetic contexts are displayed. Group (g) represents the query gene cluster. The genetic context varies, while the gene pair itself is conserved. Color indicates predicted enzymatic activity of the gene products as described in the legend. The Newick file can be found in S3 Data. BGC, biosynthetic gene cluster; CORASON, CORe Analysis of Syntenic Orthologs to prioritize Natural Product-Biosynthetic Gene Clusters; decRiPPter, Data-driven Exploratory Class-independent RiPP TrackER.

References

    1. Cooper MA, Shlaes D. Fix the antibiotics pipeline. Nature. 2011;472 (7341):32 10.1038/472032a - DOI - PubMed
    1. Payne DJ, Gwynn MN, Holmes DJ, Pompliano DL. Drugs for bad bugs: confronting the challenges of antibacterial discovery. Nat Rev Drug Discov. 2007;6 (1):29–40. 10.1038/nrd2201 - DOI - PubMed
    1. Davies J. Origins and evolution of antibiotic resistance. Microbiologia. 1996;12 (1):9–16. - PubMed
    1. Kolter R, van Wezel GP. Goodbye to brute force in antibiotic discovery? Nat Microbiol. 2016;1:15020 10.1038/nmicrobiol.2015.20 - DOI - PubMed
    1. Lewis K. Platforms for antibiotic discovery. Nat Rev Drug Discov. 2013;12 (5):371–87. 10.1038/nrd3975 - DOI - PubMed

Publication types

MeSH terms