Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2013 May 14:9:666.
doi: 10.1038/msb.2013.22.

Computational meta'omics for microbial community studies

Affiliations
Review

Computational meta'omics for microbial community studies

Nicola Segata et al. Mol Syst Biol. .

Abstract

Complex microbial communities are an integral part of the Earth's ecosystem and of our bodies in health and disease. In the last two decades, culture-independent approaches have provided new insights into their structure and function, with the exponentially decreasing cost of high-throughput sequencing resulting in broadly available tools for microbial surveys. However, the field remains far from reaching a technological plateau, as both computational techniques and nucleotide sequencing platforms for microbial genomic and transcriptional content continue to improve. Current microbiome analyses are thus starting to adopt multiple and complementary meta'omic approaches, leading to unprecedented opportunities to comprehensively and accurately characterize microbial communities and their interactions with their environments and hosts. This diversity of available assays, analysis methods, and public data is in turn beginning to enable microbiome-based predictive and modeling tools. We thus review here the technological and computational meta'omics approaches that are already available, those that are under active development, their success in biological discovery, and several outstanding challenges.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no conflict of interest.

Figures

Figure 1
Figure 1
Open biological questions in microbial community biology, and emerging technologies and models for their exploration. Microbial communities are complex biological entities interacting with the environment, host organisms, and transient microbes. Predictive models for most of the interactions within these ecosystems are currently rare, but several studies have begun to provide key insights.
Figure 2
Figure 2
Community diversity and metagenome depth interact to influence assembly quality. Five hundred and twenty-two metagenomic assemblies from the Human Microbiome Project (HMP) are shown here to demonstrate the complex interaction of underlying microbial α-diversity (x axis, diversity within a sample measured as species richness) and assembly quality (y axis). The latter was measured as the size of the smallest contig such that the cumulative length of longer contigs exceeds 4 Mbp, normalized by the total sequenced microbial nucleotide count (The Human Microbiome Project Consortium, 2012a). Communities from each of the seven available body sites are highlighted in different colors, with each point's area proportional to the total input nucleotides for assembly. Microbial composition, metagenome depth, and assembly approach (not shown) all interact to greatly influence the resulting assembly quality.
Figure 3
Figure 3
Intrinsic versus extrinsic metagenomic analysis can minimally, partially, or completely rely on prior knowledge from sequenced reference genomes. Methods that do not rely on any reference sequence information typically perform a sequence-based clustering of meta'omic reads, resulting in unlabeled clusters of sequences that can later be assigned to taxonomic or functional classes (analogous to Operational Taxonomic Unit clustering for 16S sequences). Available genomes can alternatively be used more extensively as references for short-read mapping, typically incurring an expense of high computational cost and possible ambiguous assignments for reads from nonunique regions. Intermediate approaches typically rely on a combination of pre-processing extrinsic reference genome information (e.g., to train a composition-based classifier) and intrinsic information (e.g., reads' nucleotide composition) to improve the discrimination power and focus the subsequent mapping operation to the most discriminative sequence-based markers.
Figure 4
Figure 4
A typical current computational meta'omic pipeline to analyze and contrast microbial communities. After collecting microbiome samples, community DNA or RNA is extracted and sequenced, generating WMS samples (i.e., metagenomes) generally consisting of several million short reads each. This example uses 20 WMS samples from the oral cavity (10 from the buccal mucosa, and 10 from the tongue dorsum (The Human Microbiome Project Consortium, 2012b)). Complementary methods reconstruct the taxonomic characteristics (left) and metabolic potential (right) of the microbial communities. MetaPhlAn (Segata et al, 2012) is one of many alternatives to detect and quantify microbial clades with species-level resolution (see Section 3), whereas HUMAnN (Abubucker et al, 2012) quantitatively characterizes genes, pathways, and metabolic modules from each community (see Section 4). Differentially abundant clades or pathways can then be identified and assessed by tools such as LEfSe (Segata et al, 2011) and represented graphically (e.g., here by GraPhlAn, http://huttenhower.sph.harvard.edu/graphlan). The step-by-step computational pipeline used to produce the analyses reported here is included as a tutorial in Supplementary Information and can also be downloaded from https://bitbucket.org/nsegata/metaphlan/wiki/MetaPhlAn_Pipelines_Tutorial. See Table I for alternative computational approaches to each of these currently common steps in meta'omic analysis.

Similar articles

Cited by

References

    1. Abubucker S, Segata N, Goll J, Schubert AM, Izard J, Cantarel BL, Rodriguez-Mueller B, Zucker J, Thiagarajan M, Henrissat B, White O, Kelley ST, Methe B, Schloss PD, Gevers D, Mitreva M, Huttenhower C (2012) Metabolic reconstruction for metagenomic data and its application to the human microbiome. PLoS Comput Biol 8: e1002358. - PMC - PubMed
    1. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389–3402 - PMC - PubMed
    1. Arumugam M, Harrington ED, Foerstner KU, Raes J, Bork P (2010) SmashCommunity: a metagenomic annotation and analysis tool. Bioinformatics 26: 2977–2978 - PubMed
    1. Atarashi K, Tanoue T, Shima T, Imaoka A, Kuwahara T, Momose Y, Cheng G, Yamasaki S, Saito T, Ohba Y (2011) Induction of colonic regulatory T cells by indigenous Clostridium species. Sci Signal 331: 337 - PMC - PubMed
    1. Backhed F, Ding H, Wang T, Hooper LV, Koh GY, Nagy A, Semenkovich CF, Gordon JI (2004) The gut microbiota as an environmental factor that regulates fat storage. Sci Signal 101: 15718 - PMC - PubMed

Publication types