Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Sep 30:2:e603.
doi: 10.7717/peerj.603. eCollection 2014.

GroopM: an automated tool for the recovery of population genomes from related metagenomes

Affiliations

GroopM: an automated tool for the recovery of population genomes from related metagenomes

Michael Imelfort et al. PeerJ. .

Abstract

Metagenomic binning methods that leverage differential population abundances in microbial communities (differential coverage) are emerging as a complementary approach to conventional composition-based binning. Here we introduce GroopM, an automated binning tool that primarily uses differential coverage to obtain high fidelity population genomes from related metagenomes. We demonstrate the effectiveness of GroopM using synthetic and real-world metagenomes, and show that GroopM produces results comparable with more time consuming, labor-intensive methods.

Keywords: Bioinformatics; Metagenomics; Microbial ecology; Population genome binning.

PubMed Disclaimer

Figures

Figure 1
Figure 1. An overview of the GroopM workflow.
GroopM has five stages, beginning with file parsing and ending with bin extraction. The refine step is optional and can be carried out at any stage after “core” has completed.
Figure 2
Figure 2. The distribution of tetranucleotide frequencies, coverage profiles and bin assignments for the synthetic metagenomic contigs.
The diameter of each circle is proportional to the length its respective contig. (A, C, E) Contigs are positioned according to the first two principal components of their tetranucleotide frequencies. The first principal component is positioned horizontally, the second is positioned vertically. (B, D, F) Contigs are positioned according to their x and y coordinates in GroopM transformed coverage profile space. (A, B) Each ‘true’ bin is assigned a random color and contigs are colored according to their true bin assignments. (C, D) Contigs are colored according to the accuracy of their bin assignments using TF-ESOM. (E, F) Contigs are colored according to the accuracy of their bin assignments using GroopM.
Figure 3
Figure 3. An overview of the relationships between contig length, population relative abundance and binning accuracy for the TF-ESOM and GroopM approaches.
(A) Contigs are ordered from longest to shortest and grouped together into clusters of 50. Each bar represents a single cluster and has a width that is proportional to the total number of assembled bases in that cluster. Bars are split vertically according to the percentage of their bases that are either correctly, incorrectly or not assigned. The large region of unassigned contigs in the TF-ESOM plot reflects the lower binning limit of 2 Kbp for this method. (B) Verified bins are ordered in descending relative abundance, calculated based on the number of simulated reads created using each reference. Each bar represents a single verified bin and the height of each bar represents the bin’s relative abundance. Bars are split vertically according to the percentage of their bases that are either correctly, incorrectly or not assigned by the corresponding method. Both methods had decreased accuracy for very low abundance bins however GroopM was able to correctly bin nearly all the contigs for the most dominant species.
Figure 4
Figure 4. A comparison of GroopM and Sharon bin assignments generated using visualization tools within GroopM.
(A) Contigs and resulting bins made using SPAdes and GroopM. (B) The Sharon assembly visualized in GroopM coverage space. All the contigs belonging to a single bin are assigned the same color. Each bin was assigned a random unique color with the exception of strain variants which were assigned very similar colors. GroopM-binned contigs are colored according to the bin assignment of their closest matching contig in the Sharon assembly.

References

    1. Albertsen M, Hugenholtz P, Skarshewski A, Nielsen KL, Tyson GW, Nielsen PH. Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nature Biotechnology. 2013;31:533–538. doi: 10.1038/nbt.2579. - DOI - PubMed
    1. Alneberg J, Bjarnason BS, de Bruijn I, Schirmer M, Quick J, Ijaz UZ, Loman NJ, Andersson AF, Quince C. CONCOCT: clustering cONtigs on COverage and ComposiTion. 20131312.4038
    1. Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. Journal of Computational Biology. 2012;19:455–477. doi: 10.1089/cmb.2012.0021. - DOI - PMC - PubMed
    1. Boisvert S, Raymond F, Godzaridis É, Laviolette F, Corbeil J. Ray Meta: scalable de novo metagenome assembly and profiling. Genome Biology. 2012;13:R122. doi: 10.1186/gb-2012-13-12-r122. - DOI - PMC - PubMed
    1. Brady A, Salzberg SL. Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nature Methods. 2009;6:673–676. doi: 10.1038/nmeth.1358. - DOI - PMC - PubMed

LinkOut - more resources