Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Sep;21(9):1552-60.
doi: 10.1101/gr.120618.111. Epub 2011 Jun 20.

Integrative analysis of environmental sequences using MEGAN4

Affiliations

Integrative analysis of environmental sequences using MEGAN4

Daniel H Huson et al. Genome Res. 2011 Sep.

Abstract

A major challenge in the analysis of environmental sequences is data integration. The question is how to analyze different types of data in a unified approach, addressing both the taxonomic and functional aspects. To facilitate such analyses, we have substantially extended MEGAN, a widely used taxonomic analysis program. The new program, MEGAN4, provides an integrated approach to the taxonomic and functional analysis of metagenomic, metatranscriptomic, metaproteomic, and rRNA data. While taxonomic analysis is performed based on the NCBI taxonomy, functional analysis is performed using the SEED classification of subsystems and functional roles or the KEGG classification of pathways and enzymes. A number of examples illustrate how such analyses can be performed, and show that one can also import and compare classification results obtained using others' tools. MEGAN4 is freely available for academic purposes, and installers for all three major operating systems can be downloaded from www-ab.informatik.uni-tuebingen.de/software/megan.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
MEGAN4 integrative taxonomic analysis of a 16S rRNA data set (labeled “16SrRNA-Silva-Morris2010”) and two different analyses of a metaproteome (labeled “Peptides-NR-Morris2010, and Peptides-GOS-CAMERAMorris2010”), all from Morris et al. (2010), combined with a metatranscriptome and metatranscriptome from Gilbert et al. (2008) (labeled “cDNA-Time1-Bag1-Gilbert2008 and DNA-Time1-Bag1-Gilbert2008,” respectively). The results labeled Peptides-NR-Morris2010 were obtained by a MEGAN analysis based on a comparison against the NR database, whereas those labeled Peptides-GOS-CAMERA-Morris2010 were imported from Morris et al. (2010). We display the NCBI taxonomy down to the rank of Phylum and in some parts of the Proteobacteria, down to the rank of Order. In such MEGAN4 diagrams, each taxon is displayed as a gray rectangle that contains a bar chart indicating the number of reads assigned to the taxon, on a logarithmic scale.
Figure 2.
Figure 2.
MEGAN4's integrative functional analysis (using SEED) of a metaproteome (Morris et al. 2010), metatranscriptome, and metagenome (Gilbert et al. 2008), labeled “Peptides-NR-Morris2010,” “DNA-Time1-Bag1-Gilbert2008,” and “cDNATime1-Bag1-Gilbert2008,” respectively. The classification tree has been partially expanded to show some details of the subsystems below the Carbohydrates node.
Figure 3.
Figure 3.
A MEGAN4 integrative functional analysis (using KEGG) of a metaproteome (Morris et al. 2010), metatranscriptome, and metagenome (Gilbert et al. 2008), labeled “Peptides-NR-Morris2010,” “DNA-Time1-Bag1-Gilbert2008,” and “cDNATime1-Bag1-Gilbert2008,” respectively. The classification tree has been expanded down to the second level of the KEGG classification.
Figure 4.
Figure 4.
A MEGAN4 integrative functional analysis (using KEGG) of a metaproteome (Morris et al. 2010), metatranscriptome, and metagenome (Gilbert et al. 2008), displaying the protein export pathway. Each labeled rectangle represents a participating enzyme and is underlayed by a bar chart that indicates how many reads from each of the three data sets is assigned to the enzyme, on a logarithmic scale. More details are shown whenever the mouse is placed over such a rectangle. (Courtesy of Kanehisa Laboratories, www.kegg.org.)
Figure 5.
Figure 5.
Comparison of the taxonomic analyses of a metagenome data set (Gilbert et al. 2008) computed by MEGAN4 and restricted to Prokaryotes (labeled “DNA-Time1-Bag1-Prokaryotes”) and by NBC (Rosen et al. 2010). In the latter case, we list results obtained both without using a threshold filter (labeled “DNA-Time1-Bag1-NBC”) and results obtained using a threshold filter (labeled “DNA-Time-Bag1-NBC-WithThreshold”).
Figure 6.
Figure 6.
Comparison of SEED-based functional analyses of a metatranscriptome data set (Gilbert et al. 2008) computed by MEGAN4 and by MG-RAST (Glass et al. 2010).
Figure 7.
Figure 7.
Comparison of the taxonomic analysis of a 16S rRNA data set (Morris et al. 2010), computed using five different approaches: MEGAN4's BLASTN-based SILVA analysis, the RDP website's classifier (Cole et al. 2009), MG-RAST's RDP-based approach (Glass et al. 2010), the SILVA website's aligner (Pruesse et al. 2007), and MG-RAST's SILVA-based approach targeting the SSU gene. In this figure, the bar charts on higher-rank nodes reflect the total number of reads assigned to the corresponding node or to any of the nodes in the subtree below the node.

References

    1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ 1990. Basic local alignment search tool. J Mol Biol 215: 403–410 - PubMed
    1. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL 2005. GenBank. Nucleic Acids Res 33: D34–D38 - PMC - PubMed
    1. Brady A, Salzberg SL 2009. Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nat Methods 6: 673–676 - PMC - PubMed
    1. Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, Fierer N, Pena AG, Goodrich JK, Gordon JI, et al. 2010. QIIME allows analysis of high-throughput community sequencing data. Nat Methods 7: 335–336 - PMC - PubMed
    1. Chan CK, Hsu AL, Halgamuge SK, Tang SL 2008. Binning sequences using very sparse labels within a metagenome. BMC Bioinformatics 9: 215 doi: 10.1186/1471-2105-9-215 - PMC - PubMed

Substances