Alignment-free $d_2^*$ oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences
- PMID: 27899557
- PMCID: PMC5224470
- DOI: 10.1093/nar/gkw1002
Alignment-free $d_2^*$ oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences
Abstract
Viruses and their host genomes often share similar oligonucleotide frequency (ONF) patterns, which can be used to predict the host of a given virus by finding the host with the greatest ONF similarity. We comprehensively compared 11 ONF metrics using several k-mer lengths for predicting host taxonomy from among ∼32 000 prokaryotic genomes for 1427 virus isolate genomes whose true hosts are known. The background-subtracting measure [Formula: see text] at k = 6 gave the highest host prediction accuracy (33%, genus level) with reasonable computational times. Requiring a maximum dissimilarity score for making predictions (thresholding) and taking the consensus of the 30 most similar hosts further improved accuracy. Using a previous dataset of 820 bacteriophage and 2699 bacterial genomes, [Formula: see text] host prediction accuracies with thresholding and consensus methods (genus-level: 64%) exceeded previous Euclidian distance ONF (32%) or homology-based (22-62%) methods. When applied to metagenomically-assembled marine SUP05 viruses and the human gut virus crAssphage, [Formula: see text]-based predictions overlapped (i.e. some same, some different) with the previously inferred hosts of these viruses. The extent of overlap improved when only using host genomes or metagenomic contigs from the same habitat or samples as the query viruses. The [Formula: see text] ONF method will greatly improve the characterization of novel, metagenomic viruses.
© The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Figures








Similar articles
-
Prokaryotic virus host predictor: a Gaussian model for host prediction of prokaryotic viruses in metagenomics.BMC Biol. 2021 Jan 14;19(1):5. doi: 10.1186/s12915-020-00938-6. BMC Biol. 2021. PMID: 33441133 Free PMC article.
-
VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data.Microbiome. 2017 Jul 6;5(1):69. doi: 10.1186/s40168-017-0283-5. Microbiome. 2017. PMID: 28683828 Free PMC article.
-
Prediction of virus-host infectious association by supervised learning methods.BMC Bioinformatics. 2017 Mar 14;18(Suppl 3):60. doi: 10.1186/s12859-017-1473-7. BMC Bioinformatics. 2017. PMID: 28361670 Free PMC article.
-
Computational approaches to predict bacteriophage-host relationships.FEMS Microbiol Rev. 2016 Mar;40(2):258-72. doi: 10.1093/femsre/fuv048. Epub 2015 Dec 9. FEMS Microbiol Rev. 2016. PMID: 26657537 Free PMC article. Review.
-
Phage hunters: Computational strategies for finding phages in large-scale 'omics datasets.Virus Res. 2018 Jan 15;244:110-115. doi: 10.1016/j.virusres.2017.10.019. Epub 2017 Nov 1. Virus Res. 2018. PMID: 29100906 Review.
Cited by
-
Translational adaptation of human viruses to the tissues they infect.Cell Rep. 2021 Mar 16;34(11):108872. doi: 10.1016/j.celrep.2021.108872. Cell Rep. 2021. PMID: 33730572 Free PMC article.
-
Structural characterization of a soil viral auxiliary metabolic gene product - a functional chitosanase.Nat Commun. 2022 Sep 19;13(1):5485. doi: 10.1038/s41467-022-32993-8. Nat Commun. 2022. PMID: 36123347 Free PMC article.
-
Freshwater Viral Metagenome Analyses Targeting dsDNA Viruses.Methods Mol Biol. 2024;2732:29-44. doi: 10.1007/978-1-0716-3515-5_3. Methods Mol Biol. 2024. PMID: 38060116
-
Predicting virus-host association by Kernelized logistic matrix factorization and similarity network fusion.BMC Bioinformatics. 2019 Dec 2;20(Suppl 16):594. doi: 10.1186/s12859-019-3082-0. BMC Bioinformatics. 2019. PMID: 31787095 Free PMC article.
-
Prokaryotic Genome Expansion Is Facilitated by Phages and Plasmids but Impaired by CRISPR.Front Microbiol. 2019 Oct 16;10:2254. doi: 10.3389/fmicb.2019.02254. eCollection 2019. Front Microbiol. 2019. PMID: 31681190 Free PMC article.
References
-
- Rappé M.S., Giovannoni S.J. The uncultured microbial majority. Annu. Rev. Microbiol. 2003;57:369–394. - PubMed
-
- Breitbart M., Rohwer F. Here a virus, there a virus, everywhere the same virus. Trends Microbiol. 2005;13:278–284. - PubMed
-
- Fuhrman J.A. Marine viruses and their biogeochemical and ecological effects. Nature. 1999;399:541–548. - PubMed
-
- Weinbauer M.G. Ecology of prokaryotic viruses. FEMS Microbiol. Rev. 2004;28:127–181. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous