. 2019 Aug 2;20(1):153.

doi: 10.1186/s13059-019-1760-x.

Assignment of virus and antimicrobial resistance genes to microbial hosts in a complex microbial community by combined long-read assembly and proximity ligation

Derek M Bickhart¹, Mick Watson², Sergey Koren³, Kevin Panke-Buisse¹, Laura M Cersosimo⁴, Maximilian O Press⁵, Curtis P Van Tassell⁶, Jo Ann S Van Kessel⁷, Bradd J Haley⁷, Seon Woo Kim⁷, Cheryl Heiner⁸, Garret Suen⁹, Kiranmayee Bakshy¹, Ivan Liachko⁵, Shawn T Sullivan⁵, Phillip R Myer¹⁰, Jay Ghurye¹¹, Mihai Pop¹¹, Paul J Weimer^{1

9}, Adam M Phillippy³, Timothy P L Smith¹²

Affiliations

¹ Cell Wall Biology and Utilization Laboratory, Dairy Forage Research Center, USDA, Madison, WI, 53706, USA.
² Division of Genetics and Genomics, The Roslin Institute, Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, EH25 9RG, UK.
³ Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA.
⁴ Department of Animal Sciences, University of Florida, Gainesville, FL, 32611, USA.
⁵ Phase Genomics Inc, Seattle, WA, 98109, USA.
⁶ Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, USDA, Beltsville, MD, 20705, USA.
⁷ Environmental Microbial and Food Safety Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, USDA, Beltsville, MD, 20705, USA.
⁸ Pacific Biosciences, Menlo Park, CA, USA.
⁹ Department of Bacteriology, University of Wisconsin-Madison, Madison, WI, 53706, USA.
¹⁰ Department of Animal Science, University of Tennessee, Knoxville, TN, 37996, USA.
¹¹ Department of Computer Science, University of Maryland, College Park, MD, 20742, USA.
¹² USDA-ARS U.S. Meat Animal Research Center, Clay Center, NE, 68933, USA. tim.smith2@usda.gov.

PMID: 31375138
PMCID: PMC6676630
DOI: 10.1186/s13059-019-1760-x

Assignment of virus and antimicrobial resistance genes to microbial hosts in a complex microbial community by combined long-read assembly and proximity ligation

Derek M Bickhart et al. Genome Biol. 2019.

. 2019 Aug 2;20(1):153.

doi: 10.1186/s13059-019-1760-x.

Authors

Affiliations

¹ Cell Wall Biology and Utilization Laboratory, Dairy Forage Research Center, USDA, Madison, WI, 53706, USA.
² Division of Genetics and Genomics, The Roslin Institute, Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, EH25 9RG, UK.
³ Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA.
⁴ Department of Animal Sciences, University of Florida, Gainesville, FL, 32611, USA.
⁵ Phase Genomics Inc, Seattle, WA, 98109, USA.
⁶ Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, USDA, Beltsville, MD, 20705, USA.
⁷ Environmental Microbial and Food Safety Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, USDA, Beltsville, MD, 20705, USA.
⁸ Pacific Biosciences, Menlo Park, CA, USA.
⁹ Department of Bacteriology, University of Wisconsin-Madison, Madison, WI, 53706, USA.
¹⁰ Department of Animal Science, University of Tennessee, Knoxville, TN, 37996, USA.
¹¹ Department of Computer Science, University of Maryland, College Park, MD, 20742, USA.
¹² USDA-ARS U.S. Meat Animal Research Center, Clay Center, NE, 68933, USA. tim.smith2@usda.gov.

PMID: 31375138
PMCID: PMC6676630
DOI: 10.1186/s13059-019-1760-x

Abstract

We describe a method that adds long-read sequencing to a mix of technologies used to assemble a highly complex cattle rumen microbial community, and provide a comparison to short read-based methods. Long-read alignments and Hi-C linkage between contigs support the identification of 188 novel virus-host associations and the determination of phage life cycle states in the rumen microbial community. The long-read assembly also identifies 94 antimicrobial resistance genes, compared to only seven alleles in the short-read assembly. We demonstrate novel techniques that work synergistically to improve characterization of biological features in a highly complex rumen microbial community.

Keywords: Hi-C; Metagenome assembly; Metagenomics; PacBio; Virus-host association.

PubMed Disclaimer

Conflict of interest statement

CH is an employee of Pacific Biosciences. IL, MOP, and STS are employees of Phase Genomics. The other authors declare that they have no competing interests.

Figures

**Fig. 1**
Assembly workflow and sampling bias estimates show GC% discrepancies in long-read vs short-read assemblies. Using the same sample from a cannulated cow, (a) we extracted DNA using a modified bead beating protocol that still preserved a large proportion of high molecular weight DNA strands. This DNA extraction was sequenced on a short-read sequencer (Illumina; dark green) and a long-read sequencer (PacBio RSII and Sequel; dark orange), with each sequence source assembled separately. Assessments of read- and contig-level GC% bias (b) revealed that a substantial proportion of sampled low GC DNA was not incorporated into either assembly. c Assembly contigs were annotated for likely superkingdoms of origin and were compared for overall contig lengths. The long-read assembly tended to have longer average contigs for each assembled superkingdom compared to the short-read assembly

**Fig. 2**
Identification of high-quality bins in comparative assemblies highlights the need for dereplication of different binning methods. a Binning performed by Metabat (light blue) and Proximeta Hi-C binning (Hi-C; blue) revealed that the long-read assembly consistently had fewer, longer contigs per bin than a short-read assembly. b Bin set division into medium-quality draft (MQ) and high-quality draft (HQ) bins was based on DAS_Tool single-copy gene (SCG) redundancy and completeness. Assessment of SCG completeness and redundancy revealed 10 and 42 high-quality bins in the long-read (c) and short-read (d) assemblies, respectively. The Proximeta Hi-C binning method performed better in terms of SCG metrics in the long-read assembly. e Plots of all of identified bins in the long-read (triangle) and short-read (circle) assemblies revealed a wide range of chimeric bins containing high SCG redundancy. Bins highlighted in the blue rectangle correspond to the MQ bins identified by the DAS_tool algorithm while the red rectangle corresponds to the HQ bin set

**Fig. 3**
Dataset novelty compared to other rumen metagenome assemblies. Chord diagrams showing the contig alignment overlap (by base pair) of the short-read (a) and long-read (b) contigs to the Hungate1000 and Stewart et al. [18] rumen microbial assemblies. The “Both” category consists of alignments of the short-read and long-read contigs that have alignments to both Stewart et al. [18] and the Hungate1000 datasets. c A dendrogram comparison of dataset sampling completeness compared to 16S V4 amplicon sequence data analysis. The outer rings of the dendrogram indicate the presence (blue) or absence (red) of the particular phylotype in each dataset. Datasets are represented in the following order (from the outer edge to the internal edge): (1) the short-read assembly contigs, (2) the long-read assembly contigs, and (3) 16S V4 amplicon sequence data. The internal dendrogram represents each phylum in a different color (see legend), with individual tiers corresponding to the different levels of taxonomic affiliation. The outermost edge of the dendrogram consists of the genus-level affiliation

**Fig. 4**
Network analysis of long-read alignments and Hi-C intercontig links identifies hosts for assembled viral contigs. In order to identify putative hosts for viral contigs, PacBio read alignments (light blue edges) and Hi-C intercontig link alignments (dark blue edges) were counted between viral contigs (hexagons) and non-viral contigs (circles) in the long-read assembly (a) and the short-read assembly (b). Instances where both PacBio reads and Hi-C intercontig links supported a virus-host assignment are also labeled (red edges). The long-read assembly enabled the detection of more virus-host associations in addition to several cases where viral contigs may display cross-species infectivity. We identified several viral contigs that infect important species in the rumen, including those from the genus *Sutterella*, and several species that metabolize sulfur. In addition, we identified a candidate viral association with a novel genus of rumen microbes identified in this study

**Fig. 5**
CRISPR array identification and ARG allele class counts were influenced by assembly quality. a The long-read assembly (dark orange) contigs had fewer identified CRISPR arrays than the short-read contigs (dark green); however, the CRISPR arrays with the largest count of spacers were overrepresented in the long-read assembly. b The long-read assembly had 13-fold higher antimicrobial resistance gene (ARG) alleles than the short-read assembly despite having 5-fold less sequence data coverage. The macrolide, lincosamide, and tetracycline ARG classes were particularly enriched in the long-read assembly compared to alleles identified in the short-read assembly

See this image and copyright information in PMC

References

1. Li D, Liu C-M, Luo R, Sadakane K, Lam T-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinforma Oxf Engl. 2015;31(10):1674–1676. doi: 10.1093/bioinformatics/btv033. - DOI - PubMed
1. Nurk S, Meleshko D, Korobeynikov A, Pevzner PA. metaSPAdes: a new versatile metagenomic assembler. Genome Res. 2017;27(5):824–834. doi: 10.1101/gr.213959.116. - DOI - PMC - PubMed
1. Sczyrba A, Hofmann P, Belmann P, Koslicki D, Janssen S, Droege J, et al. Critical assessment of metagenome interpretation − a benchmark of computational metagenomics software. bioRxiv. 2017:099127. 10.1101/099127 - PMC - PubMed
1. Nagarajan N, Pop M. Sequence assembly demystified. Nat Rev Genet. 2013;14(3):157–167. doi: 10.1038/nrg3367. - DOI - PubMed
1. Awad S, Irber L, Brown CT. Evaluating metagenome assembly on a simple defined community with many strain variants. bioRxiv. 2017;3:155358.

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Assignment of virus and antimicrobial resistance genes to microbial hosts in a complex microbial community by combined long-read assembly and proximity ligation

Affiliations

Assignment of virus and antimicrobial resistance genes to microbial hosts in a complex microbial community by combined long-read assembly and proximity ligation

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical