Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Mar 31;12(1):2009.
doi: 10.1038/s41467-021-22203-2.

Connecting structure to function with the recovery of over 1000 high-quality metagenome-assembled genomes from activated sludge using long-read sequencing

Affiliations

Connecting structure to function with the recovery of over 1000 high-quality metagenome-assembled genomes from activated sludge using long-read sequencing

Caitlin M Singleton et al. Nat Commun. .

Abstract

Microorganisms play crucial roles in water recycling, pollution removal and resource recovery in the wastewater industry. The structure of these microbial communities is increasingly understood based on 16S rRNA amplicon sequencing data. However, such data cannot be linked to functional potential in the absence of high-quality metagenome-assembled genomes (MAGs) for nearly all species. Here, we use long-read and short-read sequencing to recover 1083 high-quality MAGs, including 57 closed circular genomes, from 23 Danish full-scale wastewater treatment plants. The MAGs account for ~30% of the community based on relative abundance, and meet the stringent MIMAG high-quality draft requirements including full-length rRNA genes. We use the information provided by these MAGs in combination with >13 years of 16S rRNA amplicon sequencing data, as well as Raman microspectroscopy and fluorescence in situ hybridisation, to uncover abundant undescribed lineages belonging to important functional groups.

PubMed Disclaimer

Conflict of interest statement

R.H.K., M.A., S.M.K., and P.H.N. own DNASense ApS. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Conceptual overview of the value of HQ MAGs in linking structure to function.
HQ MAGs with full-length 16S rRNA genes are recovered from a full-scale AS sample, allowing linkage to the abundance and time-series data of MiDAS, and informing sample selection for further experiments. MAG and MiDAS full-length 16S rRNA gene sequences facilitate creation of lineage-specific FISH probes. Abundance is confirmed with quantitative FISH (qFISH), and morphology and location in the floc is determined. The coding sequences (CDSs) provide information on functional potential, such as the presence of phosphate accumulation enzymes (e.g., PstABCS, Pit, and PPK), allowing for the selection of novel species that may belong to certain functional guilds. Specific potential pathways, such as polyphosphate accumulation, can be experimentally determined with Raman microspectroscopy in combination with FISH and the information gained from the full-length 16S rRNA gene. This leads to confirmation of the population’s role in the AS system and uncovers targets for investigation into improved resource recovery and effective wastewater treatment.
Fig. 2
Fig. 2. Phylogenetic bacterial genome tree showing the diversity, maximum abundance, and contiguity of recovered species.
The tree is based on the concatenated alignment of 120 single-copy marker gene proteins using GTDB-Tk. The 578 HQ bacterial species representatives are shown, with phyla labeled. Circular genomes are indicated at the tips using a white filled circle. HQ MAGs from Ye et al. are indicated by the purple circles. The maximum relative abundance of the MAG across the 69 WWTP metagenomes is indicated by the heatmap. Polymorphic rate is indicated by the red bar chart and percentage of data incorporated in the longest contig within the MAG is indicated in the blue bar chart. Additional information on the MAGs is presented in Supplementary Data 3.
Fig. 3
Fig. 3. MAG recovery information across taxonomic levels.
a Sankey based on assigned taxonomy showing the novel populations at different phylogenetic levels, with the top 25 taxa shown at each level. Numbers indicate the number of MAGs recovered for the lineage. b Total MAGs unclassified by GTDB-Tk at each taxonomic level.
Fig. 4
Fig. 4. Functional profiles of the top 53 bacterial species representatives with relative abundances >1% in at least 1 sample metagenome used in this study.
Pathways are considered present if 100% of the genes in the KEGG module, or custom module (Supplementary Data 12), are encoded. Heatmap strip indicates the maximum relative abundance of the population in the metagenomes. Colors are to aid visual interpretation, e.g., purple for nitrogen-related metabolisms and green for phosphate-related metabolisms. Bar chart indicates the number of MAGs encoding the pathway of interest. Supplementary Fig. 6 shows the full taxonomic string for the nodes.
Fig. 5
Fig. 5. Overview of Ca. Methylophosphatis based on FISH, Raman microspectroscopy, and metabolic reconstruction.
a FISH micrograph of Ca. Methylophosphatis, targeted by the genus-specific probe g190_1276 (Cy3-labeled) in a full-scale activated sludge sample from Bjergmarken WWTP (2018-08-29). Two samples were examined in total and multiple images were recorded for each sample. Source data are provided as a Source Data file. Target cells appear magenta, whereas all other bacterial cells appear blue. Scale bar represents 10 μm. b Raman spectrum of Ca. Methylophosphatis (average of 100 FISH-defined cells) showing the presence of the signature peaks for polyphosphate (690 and 1170 cm−1). Peaks for phenylalanine (1004 cm−1) and amide I linkages of proteins (1450 cm−1) are specific markers for biological material. AU, arbitrary units. c Metabolic reconstruction of the Ca. Methylophosphatis MAGs. Colors represent the species or combination of species (Venn diagram) that encode the potential for the enzyme or cycle. Abbreviations: EMC, ethylmalonyl-CoA pathway; EMP, Embden–Meyerhof–Parnas pathway (glycolysis); CBB, Calvin–Benson–Bassham cycle; H4MPT, tetrahydromethanopterin pathway; H4F, tetrahydrofolate pathway; TCA, tricarboxylic acid cycle; PHA, polyhydroxyalkanoate pathway, nitrogenase (NifHDK); CH3OH, methanol, methanol dehydrogenase (MDH-xoxF); I, complex I NADH dehydrogenase; II, complex II succinate dehydrogenase; III, complex III cytochrome bc1; IV, cytochrome c oxidase; IV cbb3, complex IV cytochrome cbb3 oxidase, inorganic phosphate transporter family (Pit), inorganic phosphate ABC transporter (PstABCS), two component system for phosphate regulation (PhoRB), phosphate transport system accessory protein (PhoU); Poly-P, polyphosphate, type IV secretion system (T4SS), type IV fimbriae (T4 fimbriae), nitrate reductase respiratory (NarGHI), periplasmic nitrate reductase (NapAB), nitrite reductase (NirS), nitric oxide reductase (NorBC), acetate kinase (AckA), and phosphotransacetylase (Pta).

References

    1. Tyson GW, et al. Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature. 2004;428:37–43. doi: 10.1038/nature02340. - DOI - PubMed
    1. Venter JC, et al. Environmental genome shotgun sequencing of the Sargasso Sea. Science. 2004;304:66–74. doi: 10.1126/science.1093857. - DOI - PubMed
    1. Pasolli E, et al. Extensive unexplored human microbiome diversity revealed by over 150,000 genomes from metagenomes spanning age, geography, and lifestyle. Cell. 2019;176:649–662.e20. doi: 10.1016/j.cell.2019.01.001. - DOI - PMC - PubMed
    1. Shaiber, A. & Eren, A. M. Composite metagenome-assembled genomes reduce the quality of public genome repositories. mBio10, e00725–e00819 (2019). - PMC - PubMed
    1. Chen, L. X., Anantharaman, K., Shaiber, A. & Eren, A. M. Accurate and complete genomes from metagenomes. Genome Res. 10.1101/gr.258640.119 (2020). - PMC - PubMed

Publication types