Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 May 27:2:e415.
doi: 10.7717/peerj.415. eCollection 2014.

Strain- and plasmid-level deconvolution of a synthetic metagenome by sequencing proximity ligation products

Affiliations

Strain- and plasmid-level deconvolution of a synthetic metagenome by sequencing proximity ligation products

Christopher W Beitel et al. PeerJ. .

Abstract

Metagenomics is a valuable tool for the study of microbial communities but has been limited by the difficulty of "binning" the resulting sequences into groups corresponding to the individual species and strains that constitute the community. Moreover, there are presently no methods to track the flow of mobile DNA elements such as plasmids through communities or to determine which of these are co-localized within the same cell. We address these limitations by applying Hi-C, a technology originally designed for the study of three-dimensional genome structure in eukaryotes, to measure the cellular co-localization of DNA sequences. We leveraged Hi-C data generated from a simple synthetic metagenome sample to accurately cluster metagenome assembly contigs into groups that contain nearly complete genomes of each species. The Hi-C data also reliably associated plasmids with the chromosomes of their host and with each other. We further demonstrated that Hi-C data provides a long-range signal of strain-specific genotypes, indicating such data may be useful for high-resolution genotyping of microbial populations. Our work demonstrates that Hi-C sequencing data provide valuable information for metagenome analyses that are not currently obtainable by other methods. This metagenomic Hi-C method could facilitate future studies of the fine-scale population structure of microbes, as well as studies of how antibiotic resistance plasmids (or other genetic elements) mobilize in microbial communities. The method is not limited to microbiology; the genetic architecture of other heterogeneous populations of cells could also be studied with this technique.

Keywords: Genome scaffolding; Haplotype phasing; Hi-C; Markov clustering; Metagenome assembly; Metagenomics; Microbial ecology; Plasmids; Strain differentiation; Synthetic microbial communities.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Hi-C insert distribution.
The distribution of genomic distances between Hi-C read pairs is shown for read pairs mapping to each chromosome. For each read pair the minimum path length on the circular chromosome was calculated and read pairs separated by less than 1000 bp were discarded. The 2.5 Mb range was divided into 100 bins of equal size and the number of read pairs in each bin was recorded for each chromosome. Bin values for each chromosome were normalized to sum to 1 and plotted.
Figure 2
Figure 2. Metagenomic Hi-C associations.
The log-scaled, normalized number of Hi-C read pairs associating each genomic replicon in the synthetic community is shown as a heat map (see color scale, blue to yellow: low to high normalized, log scaled association rates). Bur1: B. thailandensis chromosome 1. Bur2: B. thailandensis chromosome 2. Lac0: L. brevis chromosome, Lac1: L. brevis plasmid 1, Lac2: L. brevis plasmid 2, Ped: P. pentosaceus, K12: E. coli K12 DH10B, BL21: E. coli BL21.
Figure 3
Figure 3. Contigs associated by Hi-C reads.
A graph is drawn with nodes depicting contigs and edges depicting associations between contigs as indicated by aligned Hi-C read pairs, with the count thereof depicted by the weight of edges. Nodes are colored to reflect the species to which they belong (see legend) with node size reflecting contig size. Contigs below 5 kb and edges with weights less than 5 were excluded. Contig associations were normalized for variation in contig size.
Figure 4
Figure 4. Hi-C contact maps for replicons of Lactobacillus brevis.
Contact maps show the number of Hi-C read pairs associating each region of the L. brevis genome. The L. brevis chromosome (Lac0, (A), Spearman rank correlation) and plasmids (Lac1, (B); Lac2, (C)) show enrichment for local associations (bright diagonal band). Interactions between Lac1 and Lac0 (D) and Lac2 and Lac0 (E) are shown. All except Lac0 are log-scaled. Circularity of Lac0 became apparent after transforming data with the Spearman rank correlation (computed for each matrix element between the row and column sharing that element) in place of log transformation (A) indicated by the high number of contacts between the ends of the sequence. In all plots, pixels are sized to represent interactions between blocks sized at 1% of the interacting genomes. The number of HindIII restriction sites in each region of sequence is shown as a histogram on the left and top of each panel.
Figure 5
Figure 5. Relationship of distance to degree of separation in Hi-C and mate-pair variant graphs.
The length of paths between random pairs of SNP sites in a SNP graph constructed from both Hi-C and mate-pair libraries of varying sizes (left; 5 kb, 10 kb, 20 kb, 40 kb), smoothed using locally-weighted regression.

References

    1. Angly FE, Willner D, Rohwer F, Hugenholtz P, Tyson GW. Grinder: a versatile amplicon and shotgun sequence simulator. Nucleic Acids Research. 2012;40(12):e415. doi: 10.1093/nar/gks251. - DOI - PMC - PubMed
    1. Bastian M, Heymann S, Jacomy M. Gephi: an open source software for exploring and manipulating networks. In: Adar E, Hurst M, Finin T, Glance NS, Nicolov N, Tseng BL, editors. The international conference on weblogs and social media. The AAAI Press; 2009.
    1. Blainey P. The future is now: single-cell genomics of bacteria and archaea. FEMS Microbiology Reviews. 2013;37(3):1–29. doi: 10.1111/1574-6976.12015. - DOI - PMC - PubMed
    1. Burton JN, Adey A, Patwardhan RP, Qiu R, Kitzman JO, Shendure J. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nature Biotechnology. 2013;31(12):1119–1125. doi: 10.1038/nbt.2727. - DOI - PMC - PubMed
    1. Darling AE, Mau B, Perna NT. ProgressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS ONE. 2010;5(6):e415. doi: 10.1371/journal.pone.0011147. - DOI - PMC - PubMed