Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jul 22;22(Suppl 10):378.
doi: 10.1186/s12859-021-04284-4.

METAMVGL: a multi-view graph-based metagenomic contig binning algorithm by integrating assembly and paired-end graphs

Affiliations

METAMVGL: a multi-view graph-based metagenomic contig binning algorithm by integrating assembly and paired-end graphs

Zhenmiao Zhang et al. BMC Bioinformatics. .

Abstract

Background: Due to the complexity of microbial communities, de novo assembly on next generation sequencing data is commonly unable to produce complete microbial genomes. Metagenome assembly binning becomes an essential step that could group the fragmented contigs into clusters to represent microbial genomes based on contigs' nucleotide compositions and read depths. These features work well on the long contigs, but are not stable for the short ones. Contigs can be linked by sequence overlap (assembly graph) or by the paired-end reads aligned to them (PE graph), where the linked contigs have high chance to be derived from the same clusters.

Results: We developed METAMVGL, a multi-view graph-based metagenomic contig binning algorithm by integrating both assembly and PE graphs. It could strikingly rescue the short contigs and correct the binning errors from dead ends. METAMVGL learns the two graphs' weights automatically and predicts the contig labels in a uniform multi-view label propagation framework. In experiments, we observed METAMVGL made use of significantly more high-confidence edges from the combined graph and linked dead ends to the main graph. It also outperformed many state-of-the-art contig binning algorithms, including MaxBin2, MetaBAT2, MyCC, CONCOCT, SolidBin and GraphBin on the metagenomic sequencing data from simulation, two mock communities and Sharon infant fecal samples.

Conclusions: Our findings demonstrate METAMVGL outstandingly improves the short contig binning and outperforms the other existing contig binning tools on the metagenomic sequencing data from simulation, mock communities and infant fecal samples.

Keywords: Assembly graph; Contig binning; Dead end; Multi-view label propagation; Paired-end graph.

PubMed Disclaimer

Conflict of interest statement

The authors have no conflicts of interest.

Figures

Fig. 1
Fig. 1
Visualization of the running process of METAMVGL compared with GraphBin in the simulated data. METAMVGL connected dead end 1 and 2 to the main graph by paired-end reads, also enhanced its connectivity. We observed (1) GraphBin failed to correct the two blue labels in the central of the graph, because it could not remove them before propagation due to lack of connectivity; (2) GraphBin mislabeled all the contigs in the dead end 2, caused by a small number of wrongly labeled contigs in the dead end; (3) METAMVGL labeled all the contigs in the dead end 1 but GraphBin did not
Fig. 2
Fig. 2
Workflow of METAMVGL. In step 1, METAMVGL constructs the assembly graph and PE graph by aligning paired-end reads to the contigs. The contigs are initially labeled by the existing binning tools (vertices in orange and blue). In step 2, the ambiguous labels are removed if their neighbors are labeled as belonging to the other binning groups in the assembly graph. METAMVGL applies the auto-weighted multi-view graph-based algorithm to optimize the weights of the two graphs and predict binning groups for the unlabeled contigs. Finally, it performs the second round ambiguous labels removal on the combined graph
Fig. 3
Fig. 3
The performance of MaxBin2, GraphBin and METAMVGL on the simulated datasets. ad Results based on the assembly by MEGAHIT, and eh results based on the assembly by metaSPAdes. The initial binning tool is MaxBin2
Fig. 4
Fig. 4
The performance of MaxBin2, GraphBin and METAMVGL on the BMock12, SYNTH64 and Sharon datasets: a, d for BMock12 dataset; b, e for SYNTH64 dataset; c, f for Sharon dataset. MEGAHIT and metaSPAdes are used to generate the assembly graphs. The initial binning tool is MaxBin2

Similar articles

Cited by

References

    1. Li J, Jia H, Cai X, Zhong H, Feng Q, Sunagawa S, Arumugam M, Kultima JR, Prifti E, Nielsen T, et al. An integrated catalog of reference genes in the human gut microbiome. Nat Biotechnol. 2014;32(8):834–841. doi: 10.1038/nbt.2942. - DOI - PubMed
    1. Truong DT, Franzosa EA, Tickle TL, Scholz M, Weingart G, Pasolli E, Tett A, Huttenhower C, Segata N. Metaphlan2 for enhanced metagenomic taxonomic profiling. Nat Methods. 2015;12(10):902–903. doi: 10.1038/nmeth.3589. - DOI - PubMed
    1. Almeida A, Mitchell AL, Boland M, Forster SC, Gloor GB, Tarkowska A, Lawley TD, Finn RD. A new genomic blueprint of the human gut microbiota. Nature. 2019;568(7753):499–504. doi: 10.1038/s41586-019-0965-1. - DOI - PMC - PubMed
    1. Almeida A, Nayfach S, Boland M, Strozzi F, Beracochea M, Shi ZJ, Pollard KS, Sakharova E, Parks DH, Hugenholtz P, et al. A unified catalog of 204,938 reference genomes from the human gut microbiome. Nat Biotechnol. 2020; 1–10. - PMC - PubMed
    1. Poyet M, Groussin M, Gibbons S, Avila-Pacheco J, Jiang X, Kearney S, Perrotta A, Berdy B, Zhao S, Lieberman T, et al. A library of human gut bacterial isolates paired with longitudinal multiomics data enables mechanistic microbiome research. Nat Med. 2019;25(9):1442–1452. doi: 10.1038/s41591-019-0559-3. - DOI - PubMed

LinkOut - more resources