Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Aug 20:10:753.
doi: 10.3389/fgene.2019.00753. eCollection 2019.

MetaTOR: A Computational Pipeline to Recover High-Quality Metagenomic Bins From Mammalian Gut Proximity-Ligation (meta3C) Libraries

Affiliations

MetaTOR: A Computational Pipeline to Recover High-Quality Metagenomic Bins From Mammalian Gut Proximity-Ligation (meta3C) Libraries

Lyam Baudry et al. Front Genet. .

Abstract

Characterizing the complete genomic structure of complex microbial communities would represent a key step toward the understanding of their diversity, dynamics, and evolution. Current metagenomics approaches aiming at this goal are typically done by analyzing millions of short DNA sequences directly extracted from the environment. New experimental and computational approaches are constantly sought for to improve the analysis and interpretation of such data. We developed MetaTOR, an open-source computational solution that bins DNA contigs into individual genomes according to their 3D contact frequencies. Those contacts are quantified by chromosome conformation capture experiments (3C, Hi-C), also known as proximity-ligation approaches, applied to metagenomics samples (meta3C). MetaTOR was applied on 20 meta3C libraries of mice gut microbiota. We quantified the program ability to recover high-quality metagenome-assembled genomes (MAGs) from metagenomic assemblies generated directly from the meta3C libraries. Whereas nine high-quality MAGs are identified in the 148-Mb assembly generated using a single meta3C library, MetaTOR identifies 82 high-quality MAGs in the 763-Mb assembly generated from the merged 20 meta3C libraries, corresponding to nearly a third of the total assembly. Compared to the hybrid binning softwares MetaBAT or CONCOCT, MetaTOR recovered three times more high-quality MAGs. These results underline the potential of 3C-/Hi-C-based approaches in metagenomic projects.

Keywords: Hi-C; binning algorithm; gut microbiome; metagenome-assembled genomes; metagenomic analysis; metagenomics Hi-C; metagenomics binning.

PubMed Disclaimer

Figures

Figure 1
Figure 1
MetaTOR pipeline. Schematic representation of the MetaTOR pipeline. (A) MetaTOR is initialized with an assembly and a set of 3C/Hi-C PE reads. (B) [Align] will align, sort, and merge reads to deliver a network of contig interactions. (C) [Partition] will deconvolve the previously defined network using a Louvain iterative procedure and (D) [Binning] will retrieve CCs (FASTA file and corresponding sub-network) from selected partition to evaluate them using CheckM. At this step, it is possible to perform a recursive procedure on selected CCs to split them further into sub-CCs. (F) [Annotation] is an optional step that use HMM models to provide final annotations. (E) The final output of the pipeline is a set of annotated bins.
Figure 2
Figure 2
MetaTOR partitioning of a complex microbial community. (A) Evolution of the number of CCs, ordered by size categories, during 400 Louvain iterations for assembly n°3 (20 samples). Color represents the amount of DNA in a given CC. Blue: 10 to 100 kb. Red: 100 to 500 kb. Green: > 500 kb. (B) Contact matrix encompassing the 224 largest CCs ordered by size, after 100 Louvain iterations (1 pixel = 200 kb). Y-axis: cumulated DNA size. (C) Completion (red) and contamination (blue) of the 129 CCs containing more than 500 kb after 100 Louvain iterations. Dashed lines: thresholds used to process CCs through a recursive procedure (completion threshold: upper 70%; contamination threshold: upper 10%). (D) Contact map of a highly contaminated CC (CC #3—100% complete—1,400% contaminated) before (left) and after (right) the recursive procedure (10 iterations; 1 pixel: 20 kb). Left map: contigs are ordered by size. Right map: sub-CCs are ordered by size. (E) Completion and contamination of the 269 CCs and sub-CCs bigger than 500 kb defined after the whole procedure. Red: completion. Blue: contamination. (F) Completion (red) and contamination (blue) levels of the sub-CCs retrieved from the original CC #3 after recursive procedure (10 iterations).
Figure 3
Figure 3
Comparison of MetaTOR, MetaBAT, and CONCOCT. CheckM output comparison for the three binning methods applied on the three assemblies tested in this work. (A) Assembly 1 (one meta3C library). (B) Assembly 2 (eight libraries). (C) Assembly 3 (20 libraries). Box plot for completion (left) and contamination (middle) and histogram of retrieved MAGs (right) are presented for the three binning methods. Only MAGs over 500 kb and harboring less than 10% of contamination are analyzed.
Figure 4
Figure 4
Statistics of low contaminated reconstructed bins. (A–B) Correlation between completion rate and N50 (A) or mean coverage (B) for bins with a contamination rate below 10%. Blue circles = MetaTOR bins. Purple diamonds = MetaBAT bins. ) (C–D) Box plot for N50 (C) and mean coverage (D) of retrieved bins with a contamination rate below 10% are presented for MetaTOR (blue circles) and MetaBAT (purple diamonds). A t-test shows a clear difference between distribution of bins’ N50 for the two software (C—p-value = 3.9 x 10-7).

Similar articles

Cited by

References

    1. Albertsen M., Hugenholtz P., Skarshewski A., Nielsen K. L., Tyson G. W., Nielsen P. H. (2013). Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat. Biotechnol. 31 (6), 533–538. 10.1038/nbt.2579 - DOI - PubMed
    1. Alneberg J., Bjarnason B. S., de Bruijn I., Schirmer M., Quick J., Ijaz U. Z., et al. (2014). Binning metagenomic contigs by coverage and composition. Nat. Methods 11 (11), 1144–1146. 10.1038/nmeth.3103 - DOI - PubMed
    1. Beitel C. W., Froenicke L., Lang J. M., Korf I. F., Michelmore R. W., Eisen J. A., et al. (2014). Strain- and plasmid-level deconvolution of a synthetic metagenome by sequencing Proximity Ligation Products. PeerJ 2, e415. 10.7717/peerj.415 - DOI - PMC - PubMed
    1. Bengtsson-Palme J., Hartmann M., Eriksson K. M., Pal C., Thorell K., Larsson D. G. J., et al. (2015). METAXA2: improved identification and taxonomic classification of small and large subunit rRNA in metagenomic data. Mol. Ecol. Resour. 15 (6), 1403–1414. 10.1111/1755-0998.12399 - DOI - PubMed
    1. Blondel V. D., Guillaume J.-L., Lambiotte R., Lefebvre E. (2008). Fast unfolding of communities in large networks. J. Stat. Mech. Theory E (10), P10008. 10.1088/1742-5468/2008/10/P10008 - DOI