Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Feb;22(1):69-77.
doi: 10.1093/dnares/dsu041. Epub 2014 Nov 27.

MetaVelvet-SL: an extension of the Velvet assembler to a de novo metagenomic assembler utilizing supervised learning

Affiliations

MetaVelvet-SL: an extension of the Velvet assembler to a de novo metagenomic assembler utilizing supervised learning

Afiahayati et al. DNA Res. 2015 Feb.

Abstract

The assembly of multiple genomes from mixed sequence reads is a bottleneck in metagenomic analysis. A single-genome assembly program (assembler) is not capable of resolving metagenome sequences, so assemblers designed specifically for metagenomics have been developed. MetaVelvet is an extension of the single-genome assembler Velvet. It has been proved to generate assemblies with higher N50 scores and higher quality than single-genome assemblers such as Velvet and SOAPdenovo when applied to metagenomic sequence reads and is frequently used in this research community. One important open problem for MetaVelvet is its low accuracy and sensitivity in detecting chimeric nodes in the assembly (de Bruijn) graph, which prevents the generation of longer contigs and scaffolds. We have tackled this problem of classifying chimeric nodes using supervised machine learning to significantly improve the performance of MetaVelvet and developed a new tool, called MetaVelvet-SL. A Support Vector Machine is used for learning the classification model based on 94 features extracted from candidate nodes. In extensive experiments, MetaVelvet-SL outperformed the original MetaVelvet and other state-of-the-art metagenomic assemblers, IDBA-UD, Ray Meta and Omega, to reconstruct accurate longer assemblies with higher N50 scores for both simulated data sets and real data sets of human gut microbial sequences.

Keywords: de novo assembler; metagenomic; microbial community; short read; supervised learning.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Chimeric nodes need to be split to obtain independent sub-graphs in a metagenomic assembly.
Figure 2.
Figure 2.
MetaVelvet-SL system consists of three major procedures: (i) construction of a de Bruijn graph; (ii) classification of chimeric nodes and (iii) final assembly tasks.
Figure 3.
Figure 3.
Chimeric nodes fall into two classes. Nodes of the same colour represent the same species. The number in each node represents the coverage value of the node. A contig sequence is also attached to each node.
Figure 4.
Figure 4.
The N-len(x) plots for the MH0006 data set of human gut microbial data.

References

    1. Scholz M.B., Lo C.C., Chain P.S. Next generation sequencing and bioinformatic bottlenecks: the current state of metagenomic data analysis. Curr. Opin. Biotechnol. 2012;23:9–15. - PubMed
    1. Namiki T., Hachiya T., Tanaka H., Sakakibara Y. Metavelvet: an extension of velvet assembler to de novo metagenome assembly from short sequence reads. Nucleic Acids Res. 2012;40:e155. - PMC - PubMed
    1. Chen K., Pachter L. Bioinformatics for whole-genome shotgun sequencing of microbial communities. PLoS Comput. Biol. 2005;1:e24. - PMC - PubMed
    1. Lai B., Ding R., Li Y., et al. A de novo metagenomic assembly program for shotgun dna reads. Bioinformatics. 2012;28:1455–62. - PubMed
    1. Laserson J., Jojic V., Koller D. Genovo: de novo assembly for metagenomes. J. Comput. Biol. 2011;18:429–43. - PubMed

Publication types