Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 1;13(1):155.
doi: 10.1186/s40168-025-02140-8.

Extensive data mining uncovers novel diversity among members of the rare biosphere within the Thermoplasmatota

Affiliations

Extensive data mining uncovers novel diversity among members of the rare biosphere within the Thermoplasmatota

Mara D Maeke et al. Microbiome. .

Abstract

Background: Rare species, especially of the marine sedimentary biosphere, have long been overlooked owing to the complexity of sediment microbial communities, their sporadic temporal and patchy spatial abundance, and challenges in cultivating environmental microorganisms. In this study, we combined enrichments, targeted metagenomic sequencing, and extensive data mining to uncover uncultivated members of the archaeal rare biosphere in marine sediments.

Results: In protein-amended enrichments, we detected the ecologically and metabolically uncharacterized class Candidatus Penumbrarchaeia within the phylum Thermoplasmatota. By screening more than 8000 metagenomic runs and 11,479 published genome assemblies, we expanded the phylogeny of Ca. Penumbrarchaeia by 3 novel orders. All six identified families of this class show low abundance in environmental samples characteristic of rare biosphere members. Members of the class Ca. Penumbrarchaeia were predicted to be involved in organic matter degradation in anoxic, carbon-rich habitats. All Ca. Penumbrarchaeia families contain high numbers of taxon-specific orthologous genes, highlighting their environmental adaptations and habitat specificity. Besides, members of this group exhibit the highest proportion of unknown genes within the entire phylum Thermoplasmatota, suggesting a high degree of functional novelty in this class.

Conclusions: In this study, we emphasize the necessity of targeted, data-integrative approaches to deepen our understanding of the rare biosphere and uncover the functions and metabolic potential hidden within these understudied taxa. Video Abstract.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: Not applicable. Consent for publication: Not applicable. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Abundance of archaeal 16S rRNA genes in second-generation enrichments on days 98 and 157. a Relative 16S rRNA gene abundance within protein samples (amended with protein, sulfate, and antibiotics) on day 98 and day 157 and control samples (amended with sulfate and antibiotics). b 16S rRNA gene copies per ml slurry of the classes Ca. Penumbrarchaeia and Lokiarchaeia subgroup Loki-2b in protein-amended samples and control samples
Fig. 2
Fig. 2
Screened data for retrieving novel Ca. Penumbrarchaeia MAGs. Number of datasets of a screened published MAGs from GenBank and b screened metagenomic samples from the SRA, the OMDB (v2), and those from which Ca. Penumbrarchaeia MAGs were reconstructed. Numbers above bars indicate the number of samples in which Ca. Penumbrarchaeia was detected and ultimately reconstructed. The category “other” includes samples of 35 additional environmental categories, for which < 100 metagenomic samples were available, none of which Ca. Penumbrarchaeia was detected in (e.g., algae, alkali sediment, viral, hypolithon, or coral reef metagenomes). c World map of all screened metagenomic samples derived through short-read data mining and the OMDB (v2) with an indication of those locations at which target MAGs were detected
Fig. 3
Fig. 3
Phylogenomic tree of the Thermoplasmatota based on 53 archaeal marker genes. a Maximum-likelihood tree (RAxML, 100 bootstraps) of 370 Thermoplasmatota MAGs and 35 Ca. Penumbrarchaeia MAGs obtained through data mining of genome assemblies and metagenomic short-read datasets. MAGs in bold define the nonredundant Ca. Penumbrarchaeia MAGs, derived through MAG dereplication (“Retrieval of EX4484-6 MAGs from public archives” section). Node labels indicate RED values, which were used to define the phylogeny of the class Ca. Penumbrarchaeia into four orders, consisting of six families (1 A (Ca. Penumbrarchaeia), 1B, 2, 3 A, 3B, 4). The environments from which single MAGs are derived are indicated by a colored strip. b Relative abundance of nonredundant Ca. Penumbrarchaeia representative MAGs in the environment estimated from metagenomic short-reads mapped against a competitive reference and quantified using coverm. A more detailed description is provided in the main “Relative MAG abundance in the environment” section. The environments individual MAGs were found in are indicated by color
Fig. 4
Fig. 4
Relative abundances of Ca. Penumbrarchaeia MAGs in screened samples for a individual nonredundant MAGs. For MAGs found in marine water samples, the filter size each sample was filtered through is indicated by shape. Colors indicate the environment of the samples in which the MAGs were found. b Relative abundance of Ca. Penumbrarchaeia MAGs summarized by family and for the class as a whole. Family 1 A corresponds to Ca. Penumbrarchaeia
Fig. 5
Fig. 5
Metabolic reconstruction of the main metabolic features of the novel class Ca. Penumbrarchaeia. The presence of genes is indicated by full or half circles for each family or with red stars if present in > 75% of the MAGs of all families. Amino acid degradation is as follows: gdhA glutamate dehydrogenase, kor 2-oxoacid:ferredoxin oxidoreductase, vor 2-oxoisovalerate ferredoxin oxidoreductase, ior indolepyruvate ferredoxin oxidoreductase, por pyruvate ferredoxin oxidoreductase, and acd acetyl coenzyme A synthetase; beta-oxidation: ACADS butyryl-CoA dehydrogenase, ACADM acyl-CoA dehydrogenase, crt enoyl-CoA hydratase, fadB 3-hydroxyacyl-CoA dehydrogenase, and fadA acetyl-CoA acyltransferase; rTCA: acl/ACLY ATP-citrate lyase, ACO aconitate hydratase, idh isocitrate dehydrogenase, korABCD 2-oxoacid:ferredoxin oxidoreductase, sucCD succinyl-CoA synthetase, sdhAB succinate dehydrogenase/fumarate reductase, fum fumarate hydratase, and mae malate dehydrogenase (oxaloacetate-decarboxylating); pyruvate metabolism: acd acetyl coenzyme A synthetase, acs acetyl-CoA synthetase, por pyruvate ferredoxin oxidoreductase, and ldh lactate dehydrogenase; and hydrogenases: hydADGB sulfhydrogenase, mvh F420-non-reducing hydrogenase, hdr heterodisulfide reductase, hypABCDEF hydrogenase expression/formation protein, atpABCDEFGHJK V/A-type H + -transporting ATPase, lctP lactate permease, hppA K(+)-stimulated pyrophosphate-energized sodium pump, and TC.NSS neurotransmitter:Na + symporter family
Fig. 6
Fig. 6
Relationship between protein novelty and MAG occurrence. a Percentage of genes encoding proteins of unknown function vs. rarity for each of the redundant MAGs within the phylum Thermoplasmatota. The x-axis was square-root (sqrt)-transformed. The rarity index was defined as median relative abundance weighted by the fraction of occurrence. MAGs below the median rarity were defined as rare, and MAGs above were defined as common. b Percentage of genes encoding proteins of unknown function in genomes of the three defined rarity groups: rare, common, and not detected (nd), which contain those genomes, to which none of the screened metagenomic short-read data mapped. Differences between groups were tested by Wilcoxon rank-sum tests, Bonferroni-adjusted significance threshold is 0.0167, and p-values are indicated by asterisks (****p < = 0.0001, ***p < = 0.001, **p < = 0.01). Number of observations N indicates number of genomes sorted into each of the defined rarity groups. c Percentage of genes encoding proteins of unknown function for each order found within the phylum Thermoplasmatota. Number of observations N represents the number of Thermoplasmatota genomes per order

Similar articles

References

    1. Sogin ML, Morrison HG, Huber JA, Mark Welch D, Huse SM, Neal PR, et al. Microbial diversity in the deep sea and the underexplored “rare biosphere.” Proc Natl Acad Sci U S A. 2006;103(32):12115–20. 10.1073/pnas.0605127103. - PMC - PubMed
    1. Pascoal F, Costa R, Magalhães C. The microbial rare biosphere: current concepts, methods and ecological principles. FEMS Microbiol Ecol. 2020;97(1): fiaa227. 10.1093/femsec/fiaa227. - PubMed
    1. Galand PE, Casamayor EO, Kirchman DL, Lovejoy C. Ecology of the rare microbial biosphere of the Arctic Ocean. Proc Natl Acad Sci U S A. 2009;106(52):22427–32. 10.1073/pnas.0908284106. - PMC - PubMed
    1. Rabinowitz D, Rapp JK, Dixon PM. Competitive abilities of sparse grass species: means of persistence or cause of abundance. Ecology. 1984;65(4):1144–54. 10.2307/1938322.
    1. Rabinowitz D. Seven forms of rarity. In: Synge H, editor. The biological aspects of rare plant conservation. New York: John Wiley and Sons; 1981. p. 205–17.

Substances

LinkOut - more resources