Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Sep 26;13(1):16105.
doi: 10.1038/s41598-023-42518-y.

Environment and taxonomy shape the genomic signature of prokaryotic extremophiles

Affiliations

Environment and taxonomy shape the genomic signature of prokaryotic extremophiles

Pablo Millán Arias et al. Sci Rep. .

Abstract

This study provides comprehensive quantitative evidence suggesting that adaptations to extreme temperatures and pH imprint a discernible environmental component in the genomic signature of microbial extremophiles. Both supervised and unsupervised machine learning algorithms were used to analyze genomic signatures, each computed as the k-mer frequency vector of a 500 kbp DNA fragment arbitrarily selected to represent a genome. Computational experiments classified/clustered genomic signatures extracted from a curated dataset of [Formula: see text] extremophile (temperature, pH) bacteria and archaea genomes, at multiple scales of analysis, [Formula: see text]. The supervised learning resulted in high accuracies for taxonomic classifications at [Formula: see text], and medium to medium-high accuracies for environment category classifications of the same datasets at [Formula: see text]. For [Formula: see text], our findings were largely consistent with amino acid compositional biases and codon usage patterns in coding regions, previously attributed to extreme environment adaptations. The unsupervised learning of unlabelled sequences identified several exemplars of hyperthermophilic organisms with large similarities in their genomic signatures, in spite of belonging to different domains in the Tree of Life.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Frequency Chaos Game Representation (fCGRk) of the global importance of various 6-mers in the classification of DNA sequences of each environment category from the rest of the dataset. The top panel shows the fCGRk for the Temperature Dataset, and the bottom panel shows the fCGRk for the pH Dataset, both for k=6. The colour and intensity of each pixel represent the relative importance (relevance) of its corresponding 6-mer (dark blue pixels represent the most relevant 6-mers, etc., as described in the colour bar legend).
Figure 2
Figure 2
Histograms of the deviation of 3-mer counts in each environment category from the Temperature Dataset mean. A 3-mer and its reverse complement are considered to be indistinguishable, and only canonical 3-mers are listed. Relevant 3-mers for the one-vs-all classification are highlighted in green. The height of each bar represents the difference between a 3-mer’s count in that temperature category and the mean of that 3-mer’s counts over the entire Temperature Dataset (in percentage points).
Figure 3
Figure 3
Histograms of the deviation of 3-mer counts in each environment category from the pH Dataset mean. A 3-mer and its reverse complement are considered to be indistinguishable, and only canonical 3-mers are listed. Relevant 3-mers for the one-vs-all classification are highlighted in green. The height of each bar represents the difference between a 3-mer’s count in that pH category and the mean of that 3-mer’s counts over the entire pH Dataset (in percentage points).
Figure 4
Figure 4
Number of true genera (blue) vs. the number of genera identified by seven clustering algorithms, for each environment category in the Temperature Dataset (left), respectively the pH Dataset (right). Only true genera that are represented by more than two sequences in the respective dataset (Temperature or pH) are considered, and only clusters meeting the quality criteria are counted.

References

    1. Rothschild LJ, Mancinelli RL. Life in extreme environments. Nature. 2001;409:1092–1101. doi: 10.1038/35059215. - DOI - PubMed
    1. Wang Q, Cen Z, Zhao J. The survival mechanisms of thermophiles at high temperatures: An angle of omics. Physiology. 2015;30:97–106. doi: 10.1152/physiol.00066.2013. - DOI - PubMed
    1. Saunders NFW, et al. Mechanisms of thermal adaptation revealed from the genomes of the Antarctic Archaea Methanogenium frigidum and Methanococcoides burtonii. Genome Res. 2003;13:1580–1588. doi: 10.1101/gr.1180903. - DOI - PMC - PubMed
    1. Raymond-Bouchard I, et al. Conserved genomic and amino acid traits of cold adaptation in subzero-growing Arctic permafrost bacteria. FEMS Microbiol. Ecol. 2018;94:fiy023. doi: 10.1093/femsec/fiy023. - DOI - PubMed
    1. Turner P, Mamo G, Karlsson EN. Potential and utilization of thermophiles and thermostable enzymes in biorefining. Microb. Cell Fact. 2007;6:9. doi: 10.1186/1475-2859-6-9. - DOI - PMC - PubMed

Publication types

LinkOut - more resources