Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Sep;22(9):553-571.
doi: 10.1038/s41576-021-00363-7. Epub 2021 Jun 3.

Mining genomes to illuminate the specialized chemistry of life

Affiliations
Review

Mining genomes to illuminate the specialized chemistry of life

Marnix H Medema et al. Nat Rev Genet. 2021 Sep.

Abstract

All organisms produce specialized organic molecules, ranging from small volatile chemicals to large gene-encoded peptides, that have evolved to provide them with diverse cellular and ecological functions. As natural products, they are broadly applied in medicine, agriculture and nutrition. The rapid accumulation of genomic information has revealed that the metabolic capacity of virtually all organisms is vastly underappreciated. Pioneered mainly in bacteria and fungi, genome mining technologies are accelerating metabolite discovery. Recent efforts are now being expanded to all life forms, including protists, plants and animals, and new integrative omics technologies are enabling the increasingly effective mining of this molecular diversity.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.. Life’s chemical diversity.
a) Bacteria, fungi, plants and animals produce a wide range of specialized metabolites that help them thrive in their respective environments. There is a large disconnect between (b) the numbers of taxonomic genera in the biosphere (as based on the NCBI taxonomy database), (c) the numbers of genomes available for these species (based on the number of species represented in the NCBI genome database), (d) the numbers of specialized metabolites isolated (based on the number of molecules ascribed to these classes of organisms in the Dictionary of Natural Products) and, (e) the estimated numbers of specialized metabolites that have been linked to genes responsible for their biosynthesis (estimates by the authors). There is likely great potential for discovering new metabolites from animals and protists, and identifying new biosynthetic pathways from plants, animals and protists. Algae includes green, red, and brown algae, diatoms and dinoflagellates. Heterotrophic protists and archaea were not included due to the low number of specialized metabolites isolated from these organisms.
Figure 2.
Figure 2.
Overview of genome mining technologies that combine genome sequence data with gene expression levels, metabolomic data, biological activity or phenotypic data, and chemical structure data. Each combination has its own strengths and may allow generating hypotheses focused on finding an unknown biosynthetic pathway for an important known molecule, discovering new metabolites with desired biological activities, or identifying potential links between metabolites and the genes and gene clusters that likely encode their biosynthesis.
Figure 3.
Figure 3.. Linking genes to molecules using metabolomics and transcriptomics.
Several approaches have been developed to link metabolites to genes and gene clusters encoding their biosynthesis. a) In bacteria, pattern-based genome mining approaches have been developed that match families of molecules (related by spectral similarity) to gene clusters families (GCFs, related by sequence similarity) through metabologenomic correlation, which identifies which GCFs co-occur strongly in the same strains where a given metabolite is observed. b) Molecules can also be connected to genes and gene clusters through feature-based matching, in which chemical features (substructures and modifications that are either manually annotated or identified using algorithms that identify motifs in MS/MS data) are linked to genes and gene modules that are known to be responsible for the biosynthesis of such features. c) Transcriptomic data can also be used to identify potential biosynthetic pathways for a molecule of interest by, for example, identifying modules of coexpressed genes whose expression correlates with the presence of a given metabolite across a range of divergent conditions (for example, different biological stresses).
Figure 4.
Figure 4.. Function-first genome mining approaches.
In order to more effectively identify molecules with desired activities, function-first genome mining approaches have been and are being developed. a) In target-based genome mining approaches, self-resistance genes are identified that genomically cluster with the biosynthetic genes. Such self-resistance genes are often resistant copies of a housekeeping gene whose protein product is targeted by the metabolite biosynthesized from the pathway. This provides a way to directly predict the mechanism of action for metabolic products of a subset of gene clusters. b) Cytological profiling can be used to identify the effects that metabolic extracts have on certain cell lines, and compound activity-mapping can identify which underlying mass-spectral features are likely responsible for activities that are shared between extracts. The activities and/or metabolites can then be matched to the presence or expression of genes and gene cluster to identify a candidate biosynthetic route towards the underlying molecule. c) Functions of products of biosynthetic genes and gene clusters can be predicted by looking for coexpression with other genes in the same organism (predicting function based on the guilt-by-association principle) or across organisms (identifying the potential effect that a pathway has on other organisms or on a microbiome-associated phenotype). d) Structural features and substructures that are likely part of the metabolic product of a gene cluster can be predicted in silico; sometimes, these substructures are diagnostic for a certain mechanism of action or biological activity, and machine learning algorithms can be trained to predict these activities based on sets of structural features.
None
None

References

    1. Davies J Specialized microbial metabolites: functions and origins. J. Antibiot 66, 361–364 (2013). - PubMed
    1. Chevrette MG et al.Evolutionary dynamics of natural product biosynthesis in bacteria. Nat. Prod. Rep 37, 566–599 (2020). - PubMed
    1. Erb M & Kliebenstein DJ Plant Secondary Metabolites as Defenses, Regulators, and Primary Metabolites: The Blurred Functional Trichotomy. Plant Physiol 184, 39–52 (2020). - PMC - PubMed
    1. Ziemert N, Alanjary M & Weber T The evolution of genome mining in microbes - a review. Nat. Prod. Rep 33, 988–1005 (2016). - PubMed
    1. Medema MH & Osbourn A Computational genomic identification and functional reconstitution of plant natural product biosynthetic pathways. Nat. Prod. Rep 33, 951–962 (2016). - PMC - PubMed

Publication types

Substances