Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Feb 8;11(1):13.
doi: 10.1186/s13321-019-0335-x.

rBAN: retro-biosynthetic analysis of nonribosomal peptides

Affiliations

rBAN: retro-biosynthetic analysis of nonribosomal peptides

Emma Ricart et al. J Cheminform. .

Abstract

Proteinogenic and non-proteinogenic amino acids, fatty acids or glycans are some of the main building blocks of nonribsosomal peptides (NRPs) and as such may give insight into the origin, biosynthesis and bioactivities of their constitutive peptides. Hence, the structural representation of NRPs using monomers provides a biologically interesting skeleton of these secondary metabolites. Databases dedicated to NRPs such as Norine, already integrate monomer-based annotations in order to facilitate the development of structural analysis tools. In this paper, we present rBAN (retro-biosynthetic analysis of nonribosomal peptides), a new computational tool designed to predict the monomeric graph of NRPs from their atomic structure in SMILES format. This prediction is achieved through the "in silico" fragmentation of a chemical structure and matching the resulting fragments against the monomers of Norine for identification. Structures containing monomers not yet recorded in Norine, are processed in a "discovery mode" that uses the RESTful service from PubChem to search the unidentified substructures and suggest new monomers. rBAN was integrated in a pipeline for the curation of Norine data in which it was used to check the correspondence between the monomeric graphs annotated in Norine and SMILES-predicted graphs. The process concluded with the validation of the 97.26% of the records in Norine, a two-fold extension of its SMILES data and the introduction of 11 new monomers suggested in the discovery mode. The accuracy, robustness and high-performance of rBAN were demonstrated in benchmarking it against other tools with the same functionality: Smiles2Monomers and GRAPE.

Keywords: Curation; Fragmentation; Monomer; Natural product; Peptide; Retro-biosynthesis; Structure analysis; Substructure search.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Example of Vancomycin processing. A First, the primary bonds mapping searches the most common bonds between NRP monomers within the molecule. This process results in the mapping of two pairs of adjacent bonds that cannot be targeted simultaneously since it would isolate some atoms. To avoid that all the possible combinations only including one of the neighboring bonds are computed. B Then, rBAN retrieves the substructures resulting from each combination and it matches them against the monomer database. A coverage score is given to each combination based on the number of atoms that could be annotated. C In this case, any of the results has a full coverage, so the algorithm proceeds to the secondary bonds search of the structure with the highest score. D The breakage of a carbon-carbon bond results in the full mapping of the peptide
Fig. 2
Fig. 2
Software architecture workflow. This flowchart describes the series of steps for processing structures with rBAN
Fig. 3
Fig. 3
Adjacent bonds breakage. Our fragmentation algorithm avoids atom isolation, which restricts the simultaneous cut of some adjacent bonds, requiring the computation of further combinations
Fig. 4
Fig. 4
Identification of monomers containing inner bonds. Some monomer bonds are sometimes fragmented by the algorithm. To handle these cases, when a small region cannot be identified, rBAN repeats the matching process after removing the bond linked to the unidentified substructure (example with Theonellapeptolide Ie)
Fig. 5
Fig. 5
Norine curation. a The curation involves two main steps: (1) Automatic verification and correction of the SMILES in Norine. rBAN validated 249 (97.26%) SMILES and identified seven potential erroneous SMILES. Retrieving the PubChem SMILES from the non-validated entries enabled the correction of the SMILES of Motuporin (NOR00825). The manual inspection of the remaining entries concluded with the confirmation of six wrong SMILES. (2) Automatic addition of SMILES retrieved from PubChem. From the 403 SMILES retrieved from PubChem, 242 were validated using rBAN. The 161 not validated are likely to be false positives due to the ambiguity of the PubChem searches performed. b Enniatin F belongs to the set of non-validated peptides. rBAN failed to validate this peptide due to differences between the molecular and monomeric annotations. The monomeric graph is circular and contains N-Methyl-Isoleucine while the SMILES encodes a linear structure with dehydro-N-Methyl-Isoleucine(1). Additionally, rBAN could not identify what is supposed to be a N-Methyl-Leucine because it misses a hydroxyl group (2)
Fig. 6
Fig. 6
Benchmarking rBAN versus s2m. a Both software were used to validate the SMILES data by comparing the Norine monomer graphs with the SMILES-based predicted graphs. rBAN could validate more peptides than s2m and four of the entries uniquely validated by s2m turned out to be false positives of the software. The manual examination of the entries uniquely validated by rBAN revealed a better capacity of the tool to annotate large structures and peptides containing heterocycles and tautomers. b The global distribution of the correctness do not show substantial differences between the two software but it proves that rBAN does not only have more correct peptides, but also less peptides with correctness values close to zero. c The monomer database was extended with new chemical entities to evaluate its effects on the peptide mapping. The results of rBAN remained unchanged proving its robustness, while the extension of the monomer database affected mapping in s2m. d The computational performance was evaluated with different amounts of input peptides. In all cases rBAN outperformed s2m, being between four and five times faster
Fig. 7
Fig. 7
Benchmarking rBAN versus GRAPE. The coverage of the annotations given by each software was compared. The distribution shows that rBAN fully annotated more peptides than GRAPE

References

    1. Newman DJ, Cragg GM. Natural products as sources of new drugs from 1981 to 2014. J Nat Prod. 2016;79:629–661. doi: 10.1021/acs.jnatprod.5b01055. - DOI - PubMed
    1. Dejong CA, Chen GM, Li H, et al. Polyketide and nonribosomal peptide retro-biosynthesis and global gene cluster matching. Nat Chem Biol. 2016;12:1007. doi: 10.1038/nchembio.2188. - DOI - PubMed
    1. Medema MH, Blin K, Cimermancic P, et al. antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences. Nucleic Acids Res. 2011;39:W339–W346. doi: 10.1093/nar/gkr466. - DOI - PMC - PubMed
    1. Harwani D, Begani J, Lakhani J. Genes to metabolites and metabolites to genes approaches to predict biosynthetic pathways in microbes for natural product discovery. In: Choudhary DK, Kumar M, Prasad R, Kumar V, editors. In silico approach for sustainable agriculture. Berlin: Springer; 2018. pp. 1–16.
    1. Blin K, Kim HU, Medema MH, Weber T. Recent development of antiSMASH and other computational approaches to mine secondary metabolite biosynthetic gene clusters. Brief Bioinform. 2017 doi: 10.1093/bib/bbx146. - DOI - PMC - PubMed

LinkOut - more resources