Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jul 1;27(13):i94-101.
doi: 10.1093/bioinformatics/btr216.

Meta-IDBA: a de Novo assembler for metagenomic data

Affiliations

Meta-IDBA: a de Novo assembler for metagenomic data

Yu Peng et al. Bioinformatics. .

Abstract

Motivation: Next-generation sequencing techniques allow us to generate reads from a microbial environment in order to analyze the microbial community. However, assembling of a set of mixed reads from different species to form contigs is a bottleneck of metagenomic research. Although there are many assemblers for assembling reads from a single genome, there are no assemblers for assembling reads in metagenomic data without reference genome sequences. Moreover, the performances of these assemblers on metagenomic data are far from satisfactory, because of the existence of common regions in the genomes of subspecies and species, which make the assembly problem much more complicated.

Results: We introduce the Meta-IDBA algorithm for assembling reads in metagenomic data, which contain multiple genomes from different species. There are two core steps in Meta-IDBA. It first tries to partition the de Bruijn graph into isolated components of different species based on an important observation. Then, for each component, it captures the slight variants of the genomes of subspecies from the same species by multiple alignments and represents the genome of one species, using a consensus sequence. Comparison of the performances of Meta-IDBA and existing assemblers, such as Velvet and Abyss for different metagenomic datasets shows that Meta-IDBA can reconstruct longer contigs with similar accuracy.

Availability: Meta-IDBA toolkit is available at our website http://www.cs.hku.hk/~alse/metaidba.

Contact: chin@cs.hku.hk.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
A component in de Bruijn graph of five E.coli subspecies.
Fig. 2.
Fig. 2.
Workflow of Meta-IDBA algorithm.
Fig. 3.
Fig. 3.
Experiment results of low-complexity datasets.
Fig. 4.
Fig. 4.
Experiment results of medium-complexity datasets.
Fig. 5.
Fig. 5.
Experiment results of high-complexity datasets.
Fig. 6.
Fig. 6.
Multiple alignment of a component in five E.coli subspecies. Consensus is shown in the first row. Contigs are separated by spaces. The conserved nucleotides are represented by dots. The difference between contigs and consensus is highlighted.

References

    1. Chaisson M.J., et al. De novo fragment assembly with short mate-paired reads: does the read length matter? Genome Res. 2009;19:336–346. - PMC - PubMed
    1. Chaisson M.J., Pevzner P.A. Short read fragment assembly of bacterial genomes. Genome Res. 2008;18:324–330. - PMC - PubMed
    1. Fofanov Y., et al. How independent are the appearances of n-mers in different genomes? Bioinformatics. 2004;20:2421–2428. - PubMed
    1. Gnerre S., et al. Assisted assembly: how to improve a de novo genome assembly by using related species. Genome Biol. 2009;10:R88. - PMC - PubMed
    1. Hong S.H., et al. Predicting microbial species richness. Proc. Natl Acad. Sci. USA. 2006;103:117–122. - PMC - PubMed

Publication types