Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Aug 2;20(1):153.
doi: 10.1186/s13059-019-1760-x.

Assignment of virus and antimicrobial resistance genes to microbial hosts in a complex microbial community by combined long-read assembly and proximity ligation

Affiliations

Assignment of virus and antimicrobial resistance genes to microbial hosts in a complex microbial community by combined long-read assembly and proximity ligation

Derek M Bickhart et al. Genome Biol. .

Abstract

We describe a method that adds long-read sequencing to a mix of technologies used to assemble a highly complex cattle rumen microbial community, and provide a comparison to short read-based methods. Long-read alignments and Hi-C linkage between contigs support the identification of 188 novel virus-host associations and the determination of phage life cycle states in the rumen microbial community. The long-read assembly also identifies 94 antimicrobial resistance genes, compared to only seven alleles in the short-read assembly. We demonstrate novel techniques that work synergistically to improve characterization of biological features in a highly complex rumen microbial community.

Keywords: Hi-C; Metagenome assembly; Metagenomics; PacBio; Virus-host association.

PubMed Disclaimer

Conflict of interest statement

CH is an employee of Pacific Biosciences. IL, MOP, and STS are employees of Phase Genomics. The other authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Assembly workflow and sampling bias estimates show GC% discrepancies in long-read vs short-read assemblies. Using the same sample from a cannulated cow, (a) we extracted DNA using a modified bead beating protocol that still preserved a large proportion of high molecular weight DNA strands. This DNA extraction was sequenced on a short-read sequencer (Illumina; dark green) and a long-read sequencer (PacBio RSII and Sequel; dark orange), with each sequence source assembled separately. Assessments of read- and contig-level GC% bias (b) revealed that a substantial proportion of sampled low GC DNA was not incorporated into either assembly. c Assembly contigs were annotated for likely superkingdoms of origin and were compared for overall contig lengths. The long-read assembly tended to have longer average contigs for each assembled superkingdom compared to the short-read assembly
Fig. 2
Fig. 2
Identification of high-quality bins in comparative assemblies highlights the need for dereplication of different binning methods. a Binning performed by Metabat (light blue) and Proximeta Hi-C binning (Hi-C; blue) revealed that the long-read assembly consistently had fewer, longer contigs per bin than a short-read assembly. b Bin set division into medium-quality draft (MQ) and high-quality draft (HQ) bins was based on DAS_Tool single-copy gene (SCG) redundancy and completeness. Assessment of SCG completeness and redundancy revealed 10 and 42 high-quality bins in the long-read (c) and short-read (d) assemblies, respectively. The Proximeta Hi-C binning method performed better in terms of SCG metrics in the long-read assembly. e Plots of all of identified bins in the long-read (triangle) and short-read (circle) assemblies revealed a wide range of chimeric bins containing high SCG redundancy. Bins highlighted in the blue rectangle correspond to the MQ bins identified by the DAS_tool algorithm while the red rectangle corresponds to the HQ bin set
Fig. 3
Fig. 3
Dataset novelty compared to other rumen metagenome assemblies. Chord diagrams showing the contig alignment overlap (by base pair) of the short-read (a) and long-read (b) contigs to the Hungate1000 and Stewart et al. [18] rumen microbial assemblies. The “Both” category consists of alignments of the short-read and long-read contigs that have alignments to both Stewart et al. [18] and the Hungate1000 datasets. c A dendrogram comparison of dataset sampling completeness compared to 16S V4 amplicon sequence data analysis. The outer rings of the dendrogram indicate the presence (blue) or absence (red) of the particular phylotype in each dataset. Datasets are represented in the following order (from the outer edge to the internal edge): (1) the short-read assembly contigs, (2) the long-read assembly contigs, and (3) 16S V4 amplicon sequence data. The internal dendrogram represents each phylum in a different color (see legend), with individual tiers corresponding to the different levels of taxonomic affiliation. The outermost edge of the dendrogram consists of the genus-level affiliation
Fig. 4
Fig. 4
Network analysis of long-read alignments and Hi-C intercontig links identifies hosts for assembled viral contigs. In order to identify putative hosts for viral contigs, PacBio read alignments (light blue edges) and Hi-C intercontig link alignments (dark blue edges) were counted between viral contigs (hexagons) and non-viral contigs (circles) in the long-read assembly (a) and the short-read assembly (b). Instances where both PacBio reads and Hi-C intercontig links supported a virus-host assignment are also labeled (red edges). The long-read assembly enabled the detection of more virus-host associations in addition to several cases where viral contigs may display cross-species infectivity. We identified several viral contigs that infect important species in the rumen, including those from the genus Sutterella, and several species that metabolize sulfur. In addition, we identified a candidate viral association with a novel genus of rumen microbes identified in this study
Fig. 5
Fig. 5
CRISPR array identification and ARG allele class counts were influenced by assembly quality. a The long-read assembly (dark orange) contigs had fewer identified CRISPR arrays than the short-read contigs (dark green); however, the CRISPR arrays with the largest count of spacers were overrepresented in the long-read assembly. b The long-read assembly had 13-fold higher antimicrobial resistance gene (ARG) alleles than the short-read assembly despite having 5-fold less sequence data coverage. The macrolide, lincosamide, and tetracycline ARG classes were particularly enriched in the long-read assembly compared to alleles identified in the short-read assembly

References

    1. Li D, Liu C-M, Luo R, Sadakane K, Lam T-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinforma Oxf Engl. 2015;31(10):1674–1676. doi: 10.1093/bioinformatics/btv033. - DOI - PubMed
    1. Nurk S, Meleshko D, Korobeynikov A, Pevzner PA. metaSPAdes: a new versatile metagenomic assembler. Genome Res. 2017;27(5):824–834. doi: 10.1101/gr.213959.116. - DOI - PMC - PubMed
    1. Sczyrba A, Hofmann P, Belmann P, Koslicki D, Janssen S, Droege J, et al. Critical assessment of metagenome interpretation − a benchmark of computational metagenomics software. bioRxiv. 2017:099127. 10.1101/099127 - PMC - PubMed
    1. Nagarajan N, Pop M. Sequence assembly demystified. Nat Rev Genet. 2013;14(3):157–167. doi: 10.1038/nrg3367. - DOI - PubMed
    1. Awad S, Irber L, Brown CT. Evaluating metagenome assembly on a simple defined community with many strain variants. bioRxiv. 2017;3:155358.

Publication types