Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Sep;77(17):6000-11.
doi: 10.1128/AEM.00107-11. Epub 2011 Jul 15.

Metagenomic insights into the evolution, function, and complexity of the planktonic microbial community of Lake Lanier, a temperate freshwater ecosystem

Affiliations

Metagenomic insights into the evolution, function, and complexity of the planktonic microbial community of Lake Lanier, a temperate freshwater ecosystem

Seungdae Oh et al. Appl Environ Microbiol. 2011 Sep.

Abstract

Lake Lanier is an important freshwater lake for the southeast United States, as it represents the main source of drinking water for the Atlanta metropolitan area and is popular for recreational activities. Temperate freshwater lakes such as Lake Lanier are underrepresented among the growing number of environmental metagenomic data sets, and little is known about how functional gene content in freshwater communities relates to that of other ecosystems. To better characterize the gene content and variability of this freshwater planktonic microbial community, we sequenced several samples obtained around a strong summer storm event and during the fall water mixing using a random whole-genome shotgun (WGS) approach. Comparative metagenomics revealed that the gene content was relatively stable over time and more related to that of another freshwater lake and the surface ocean than to soil. However, the phylogenetic diversity of Lake Lanier communities was distinct from that of soil and marine communities. We identified several important genomic adaptations that account for these findings, such as the use of potassium (as opposed to sodium) osmoregulators by freshwater organisms and differences in the community average genome size. We show that the lake community is predominantly composed of sequence-discrete populations and describe a simple method to assess community complexity based on population richness and evenness and to determine the sequencing effort required to cover diversity in a sample. This study provides the first comprehensive analysis of the genetic diversity and metabolic potential of a temperate planktonic freshwater community and advances approaches for comparative metagenomics.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Lake Lanier microbial community structure based on 16S rRNA genes. All sequence fragments of 16S rRNA genes recovered in the Lake Lanier 454 metagenome were queried against the RDP database to identify their closest phylogenetic relative with a complete 16S rRNA gene sequence available. The tree was constructed using the ARB parsimony tool (25) based on the complete sequences of the 16S rRNA genes of all closest relatives (one relative per fragment recovered). Shading denotes taxa of the same phylum. Eukaryotic sequences were used as the outgroup. The numbers in parentheses refer to the number of individual sequences included in a cluster. Unassigned denotes sequences that were not possible to align robustly against the RDP database.
Fig. 2.
Fig. 2.
Principal coordinate analysis of the metagenomic data sets analyzed in this study. The analysis was performed using the Bray-Curtis distance method based on gene abundance of 4,870 COG. The percentage of variation explained by the first two principal coordinates is provided in parentheses. LL denotes Lake Lanier samples; LL-S1 and LL-454, Illumina and 454 samples taken 1 day before the summer storm, respectively; LL-S2, LL-S3, and LL-S4, Illumina samples taken 1 day, 1 week, and 3 months after the storm, respectively. Note that the Lake Lanier samples clustered closely together, revealing that the gene content of the Lake Lanier microbial community is stable over time and shows much smaller temporal variation than the variation observed between the Lake Lanier data sets and the remaining data sets used in the study.
Fig. 3.
Fig. 3.
Comparison of the Lake Lanier community to those of other habitats. The cladograms represent the relationships among the metagenomes in terms of relative abundance of bacterial families (A), phyla (B), genes assignable to the major functional categories of the COG database (C), and protein families of the Pfam database (D). Clustering was performed using the Cluster 3.0 software (12) with the Euclidian similarity metric. Other similarity metrics, such as Pearson and Bray-Curtis, provided similar results (not shown). Numbers on the nodes represent bootstrap support from 100 replicates. Bacterial families and phyla were identified based on the phylogenetic analysis of 16S rRNA gene sequence fragments recovered in each WGS data set. The raw data used for clustering are provided in Tables S1 (Pfam) and S2 (families and phyla) in the supplemental material.
Fig. 4.
Fig. 4.
Average genome size explains some of the shifts in microbial gene content between different habitats. All nonredundant genes in a WGS data set (figure key) were assigned to a major COG functional category (x axis), and the fraction of the total genes assignable to each category (y axis) was compared to that of the combined Lake Lanier data set. The bars represent the average fold change, and the error bars indicate the standard deviation based on the five available Lake Lanier data sets (the average values of the three open and the three coastal ocean data sets were used in the comparisons; only one sample/value was available for deep sea and farm soil). A similar analysis was performed between two groups of sequenced genomes (from GenBank) that had the same genome size as the estimated average genome size in the Lake Lanier and the farm soil metagenomes (white columns in inset). Note that the latter distribution matches closely with that of the soil versus Lake Lanier comparison (shaded columns in inset).
Fig. 5.
Fig. 5.
Differential abundance of genes in freshwater versus saltwater habitats based on the Pfam database. All nonredundant proteins predicted in the combined Lake Lanier data sets and three estuarine/coastal ocean WGS data sets were searched against the Pfam database to assign each protein to a Pfam model. The significantly enriched Pfam models (P < 0.05) in the lake relative to the coastal saltwater data sets are shown. The bars represent the average fold changes, and the error bars indicate the standard deviations based on the five available Lake Lanier data sets (the average values of the three estuarine/coastal ocean data sets were used in the comparisons).
Fig. 6.
Fig. 6.
The Lake Lanier community is composed of sequence-discrete populations. (Top) Fragment recruitment plots of the Lake Lanier 454 WGS reads, performed essentially as described previously (35), using the contigs that were assembled from the 454 metagenome as reference sequences. (Bottom) Coverage plots of the same data, performed as described previously (21). The coverage (y axis) is calculated by summing the length of all reads mapping fully on the contig with a given nucleotide identity (x axis) divided by the total length of the contig. Thus, the coverage is normalized to the length of the reference sequence used, is directly comparable between the two bottom panels, and is representative of the relative in situ abundance of the corresponding populations. Note that in the case of Synechococcus, no other close relative cooccurs in the community sampled, while in the case of Burkholderia, a less abundant (compared to the population represented by the contig used as reference sequence) close relative, i.e., showing 80 to 85% nucleotide identity to the reference contig, was also present in the community.
Fig. 7.
Fig. 7.
Comparisons of microbial community complexity between different habitats. The graphs show the number of contigs (y axis) plotted against the number of reads composing the contig (x axis) resulting from the assembly of 100,000 randomly selected reads from each corresponding WGS data set. Exponential regression was fitted to the data, and the fitted curves are shown on the graph. Note that our approach takes into account both the relative abundance (evenness, represented by the number of reads per contig) as well as the number of unique populations (richness, represented by the number of contigs) of the communities.

References

    1. Altschul S. F., et al. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389–3402 - PMC - PubMed
    1. Angly F. E., et al. 2009. The GAAS metagenomic tool and its estimations of viral and microbial average genome size in four major biomes. PLoS Comput. Biol. 5:e1000593. - PMC - PubMed
    1. Beja O., et al. 2000. Bacterial rhodopsin: evidence for a new type of phototrophy in the sea. Science 289:1902–1906 - PubMed
    1. Benson D. A., Karsch-Mizrachi I., Lipman D. J., Ostell J., Wheeler D. L. 2007. GenBank. Nucleic Acids Res. 35:D21–D25 - PMC - PubMed
    1. Bhaya D., et al. 2007. Population level functional diversity in a microbial community revealed by comparative genomic and metagenomic analyses. ISME J. 1:703–713 - PubMed

Publication types

LinkOut - more resources