Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Sep 30;38(19):4481-4487.
doi: 10.1093/bioinformatics/btac557.

Metagenomic binning with assembly graph embeddings

Affiliations

Metagenomic binning with assembly graph embeddings

Andre Lamurias et al. Bioinformatics. .

Abstract

Motivation: Despite recent advancements in sequencing technologies and assembly methods, obtaining high-quality microbial genomes from metagenomic samples is still not a trivial task. Current metagenomic binners do not take full advantage of assembly graphs and are not optimized for long-read assemblies. Deep graph learning algorithms have been proposed in other fields to deal with complex graph data structures. The graph structure generated during the assembly process could be integrated with contig features to obtain better bins with deep learning.

Results: We propose GraphMB, which uses graph neural networks to incorporate the assembly graph into the binning process. We test GraphMB on long-read datasets of different complexities, and compare the performance with other binners in terms of the number of High Quality (HQ) genome bins obtained. With our approach, we were able to obtain unique bins on all real datasets, and obtain more bins on most datasets. In particular, we obtained on average 17.5% more HQ bins when compared with state-of-the-art binners and 13.7% when aggregating the results of our binner with the others. These results indicate that a deep learning model can integrate contig-specific and graph-structure information to improve metagenomic binning.

Availability and implementation: GraphMB is available from https://github.com/MicrobialDarkMatter/GraphMB.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
GraphMB’s workflow. (a) The metagenome of an environmental sample is sequenced and assembled into contigs. (b) Initial embeddings are computed with a variational auto-encoder based on k-mer composition and abundance features. (c) The input of the GNN are the initial contig embeddings and the graph structure provided by the assembly graph. The thickness of the edge corresponds to the number of reads that cover it. (d) The GNN model learns new embeddings by aggregating neighboring contigs (nodes in the assembly graph). (e) The final embeddings are clustered and bins are obtained

References

    1. Albertsen M. et al. (2013) Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat. Biotechnol., 31, 533–538. - PubMed
    1. Alneberg J. et al. (2014) Binning metagenomic contigs by coverage and composition. Nat. Methods, 11, 1144–1146. - PubMed
    1. Brunbjerg A.K. et al. (2019) A systematic survey of regional multi-taxon biodiversity: evaluating strategies and coverage. BMC Ecol., 19, 1–15. - PMC - PubMed
    1. Burge C. et al. (1992) Over-and under-representation of short oligonucleotides in DNA sequences. Proc. Natl. Acad. Sci. USA, 89, 1358–1362. - PMC - PubMed
    1. Feng X. et al. (2021) Metagenome assembly of high-fidelity long reads with hifiasm-meta. Nat. Method., 19, 671–674. https://doi.org/10.1038/s41592-022-01478-3. - PMC - PubMed

Publication types