Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Sep 16;8(2):veac087.
doi: 10.1093/ve/veac087. eCollection 2022.

Host prediction for disease-associated gastrointestinal cressdnaviruses

Affiliations

Host prediction for disease-associated gastrointestinal cressdnaviruses

Cormac M Kinsella et al. Virus Evol. .

Abstract

Metagenomic techniques have facilitated the discovery of thousands of viruses, yet because samples are often highly biodiverse, fundamental data on the specific cellular hosts are usually missing. Numerous gastrointestinal viruses linked to human or animal diseases are affected by this, preventing research into their medical or veterinary importance. Here, we developed a computational workflow for the prediction of viral hosts from complex metagenomic datasets. We applied it to seven lineages of gastrointestinal cressdnaviruses using 1,124 metagenomic datasets, predicting hosts of four lineages. The Redondoviridae, strongly associated to human gum disease (periodontitis), were predicted to infect Entamoeba gingivalis, an oral pathogen itself involved in periodontitis. The Kirkoviridae, originally linked to fatal equine disease, were predicted to infect a variety of parabasalid protists, including Dientamoeba fragilis in humans. Two viral lineages observed in human diarrhoeal disease (CRESSV1 and CRESSV19, i.e. pecoviruses and hudisaviruses) were predicted to infect Blastocystis spp. and Endolimax nana respectively, protists responsible for millions of annual human infections. Our prediction approach is adaptable to any virus lineage and requires neither training datasets nor host genome assemblies. Two host predictions (for the Kirkoviridae and CRESSV1 lineages) could be independently confirmed as virus-host relationships using endogenous viral elements identified inside host genomes, while a further prediction (for the Redondoviridae) was strongly supported as a virus-host relationship using a case-control screening experiment of human oral plaques.

Keywords: Redondoviridae; cressdnavirus; host identification; metagenomics; periodontitis; protist.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Maximum likelihood phylogenetic tree of the Arfiviricetes, rooted at the midpoint. Scale bar denotes amino acid substitutions per site. Branch supports are given for each named lineage, with SH-aLRT scores on the left and ultrafast bootstrap scores on the right. All sequences found outside of collapsed nodes did not meet criteria for naming a lineage.
Figure 2.
Figure 2.
Recombination within gastrointestinal cressdnavirus lineages. (A) Upper right: phylogenetic compatibility matrix (Robinson-Foulds distance) computed on an alignment of redondovirus genomes, lower left: LARD breakpoint matrix computed on the same alignment. (B–F) Rep and Cap protein tanglegrams for five cressdnavirus lineages. Dotted lines connect proteins encoded by the same genome. Branch colour denotes isolation source as listed in the key. Grey blocks denote groups linked by RDP4 detected recombination events, and different shades represent different recombination groups (Panel D only). Scale bars on individual phylograms are in amino acid substitutions per site. NHP: non-human primate.
Figure 3.
Figure 3.
Distribution of gastrointestinal cressdnaviruses across seven sample cohorts. Colour represents normalised read count. Empty columns (viruses not found in any sample) and rows (samples containing no viruses) were removed prior to plotting. Members of the CRESSV16 lineage were not detected. Taxon silhouettes are from phylopic.org (Homo sapiens by T. Michael Keesey, Sus scrofa by Steven Traver). Sample cohorts and viral reference genomes used are reported in Supplementary Tables S2 and S3.
Figure 4.
Figure 4.
EVEs in protist genomes support host inferences. (A) Clustered Rep-like EVEs from Blastocystis spp. assemblies. Connections represent significant BLASTp alignments between EVEs, with shade corresponding to level of significance (maximum/worst e-value = 1e-10). Four EVEs identified by Liu et al. (2011) were clustered alongside all thirty-seven Rep-like EVEs detected here. (B) Regions of interest from a phylogeny of Rep-like EVEs and representatives of cressdnavirus lineages (see also Supplementary Fig. S4). Scale bar represents amino acid substitutions per site. (C) Nucmer alignment dotplot between EVE-containing scaffolds from two Histomonas meleagridis genome assemblies. Colour denotes alignment percentage similarity. For the list of aligned scaffolds, see Supplementary Table S8.

References

    1. Abbas A. A. et al. (2019) ‘Redondoviridae, a Family of Small, Circular DNA Viruses of the Human Oro-respiratory Tract that are Associated with Periodontitis and Critical Illness’, Cell Host & Microbe, 25: 719–29. - PMC - PubMed
    1. Ahlgren N. A. et al. (2017) ‘Alignment-free D2* Oligonucleotide Frequency Dissimilarity Measure Improves Prediction of Hosts from Metagenomically-derived Viral Sequences’, Nucleic Acids Research, 45: 39–53. - PMC - PubMed
    1. Altan E. et al. (2017) ‘Small Circular Rep-encoding Single-stranded DNA Genomes in Peruvian Diarrhea Virome’, Genome Announcements, 5: e00822–17. - PMC - PubMed
    1. Ayad L. A. K., and Pissis S. P. (2017) ‘MARS: Improving Multiple Circular Sequence Alignment Using Refined Sequences’, BMC Genomics, 18: 1–10. - PMC - PubMed
    1. Babayan S. A., Orton R. J., and Streicker D. G. (2018) ‘Predicting Reservoir Hosts and Arthropod Vectors from Evolutionary Signatures in RNA Virus Genomes’, Science, 362: 577–80. - PMC - PubMed