Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Mar 8:5:e3035.
doi: 10.7717/peerj.3035. eCollection 2017.

BinSanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation

Affiliations

BinSanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation

Elaina D Graham et al. PeerJ. .

Abstract

Metagenomics has become an integral part of defining microbial diversity in various environments. Many ecosystems have characteristically low biomass and few cultured representatives. Linking potential metabolisms to phylogeny in environmental microorganisms is important for interpreting microbial community functions and the impacts these communities have on geochemical cycles. However, with metagenomic studies there is the computational hurdle of 'binning' contigs into phylogenetically related units or putative genomes. Binning methods have been implemented with varying approaches such as k-means clustering, Gaussian mixture models, hierarchical clustering, neural networks, and two-way clustering; however, many of these suffer from biases against low coverage/abundance organisms and closely related taxa/strains. We are introducing a new binning method, BinSanity, that utilizes the clustering algorithm affinity propagation (AP), to cluster assemblies using coverage with compositional based refinement (tetranucleotide frequency and percent GC content) to optimize bins containing multiple source organisms. This separation of composition and coverage based clustering reduces bias for closely related taxa. BinSanity was developed and tested on artificial metagenomes varying in size and complexity. Results indicate that BinSanity has a higher precision, recall, and Adjusted Rand Index compared to five commonly implemented methods. When tested on a previously published environmental metagenome, BinSanity generated high completion and low redundancy bins corresponding with the published metagenome-assembled genomes.

Keywords: Affinity propagation; Binning; Clustering; Metagenome-assembled genomes; Metagenomics; Microbial ecology.

PubMed Disclaimer

Conflict of interest statement

The authors declare there are no competing interests.

Figures

Figure 1
Figure 1. Workflow for Binsanity indicating all scripts used.
Figure 2
Figure 2. Stastistical calculations (bin_evaluation.py) showing the adjusted rand index (ARI) (A), precision (B), recall (C), and V-measure (D) for diverse-mixture-1.
Figure 3
Figure 3. Clustering results for diverse-mixture-1 BinSanity, BinSanity+refinement, CONCOCT, MetaBat, MyCC, MaxBin, and GroopM at five in silico metagenomes (visualized via Anvi’o).
Black dashed boxes highlight bins in each method that contained contigs from two or more reference organisms. White represents those contigs that were left un-clustered.
Figure 4
Figure 4. Statistical calculations (bin_evaluation.py) showing Adjusted Rand Index (A), Precision (B), Recall (C), and V-Measure (D) for diverse-mixture-2.
Figure 5
Figure 5. Statistical calculations (bin_evaluation.py) for Adjusted Rand Index (A), Precision (B), Recall (C), and V-Measure (D) for the strain-mixture.
Figure 6
Figure 6. Clustering of the infant gut metagenome by BinSanity, CONCOCT, GroopM, MaxBin, MetaBat, MyCC, Eren et al. (2015) and Sharon et al. (2013).
The image was generated through Anvi’o.

References

    1. Alneberg J, Bjarnason BS, De Bruijn I, Schirmer M, Quick J, Ijaz UZ, Lahti L, Loman NJ, Andersson AF, Quince C. Binning metagenomic contigs by coverage and composition. Nature Methods. 2014;11:1144–1146. doi: 10.1038/nmeth.3103. - DOI - PubMed
    1. Anantharaman K, Breier JA, Dick GJ. Metagenomic resolution of microbial functions in deep-sea hydrothermal plumes across the Eastern Lau Spreading Center. ISME Journal. 2016;10:225–239. doi: 10.1038/ismej.2015.81. - DOI - PMC - PubMed
    1. Bohlin J, Snipen L, Hardy SP, Kristoffersen AB, Lagesen K, Dønsvik T, Skjerve E, Ussery DW. Analysis of intra-genomic GC content homogeneity within prokaryotes. BMC Genomics. 2010;11:1–8. doi: 10.1186/1471-2164-11-464. - DOI - PMC - PubMed
    1. Bowers RM, Clum A, Tice H, Lim J, Singh K, Ciobanu D, Ngan CY, Cheng J-F, Tringe SG, Woyke T. Impact of library preparation protocols and template quantity on the metagenomic reconstruction of a mock microbial community. BMC Genomics. 2015;16:1–12. doi: 10.1186/s12864-015-2063-6. - DOI - PMC - PubMed
    1. Chen SL, Lee W, Hottes AK, Shapiro L, McAdams HH. Codon usage between genomes is constrained by genome-wide mutational processes. Proceedings of the National Academy of Sciences of the United States of America. 2004;101:3480–3485. doi: 10.1073/pnas.0307827100. - DOI - PMC - PubMed

LinkOut - more resources