Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Nov 2;362(6414):577-580.
doi: 10.1126/science.aap9072.

Predicting reservoir hosts and arthropod vectors from evolutionary signatures in RNA virus genomes

Affiliations

Predicting reservoir hosts and arthropod vectors from evolutionary signatures in RNA virus genomes

Simon A Babayan et al. Science. .

Abstract

Identifying the animal origins of RNA viruses requires years of field and laboratory studies that stall responses to emerging infectious diseases. Using large genomic and ecological datasets, we demonstrate that animal reservoirs and the existence and identity of arthropod vectors can be predicted directly from viral genome sequences via machine learning. We illustrate the ability of these models to predict the epidemiology of diverse viruses across most human-infective families of single-stranded RNA viruses, including 69 viruses with previously elusive or never-investigated reservoirs or vectors. Models such as these, which capitalize on the proliferation of low-cost genomic sequencing, can narrow the time lag between virus discovery and targeted research, surveillance, and management.

PubMed Disclaimer

Conflict of interest statement

Competing interests: All authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1. Distribution and hierarchical clustering of reservoir host and arthropod vector associations across viral taxonomic groups.
(A) Barplots show the number of viruses in the dataset from each reservoir host and vector class and the number of orphan viruses in each viral group. The order Artiodactyla (even-toed ungulates) includes the Bovidae, Camelidae, Suidae, Antilocapridae, and Giraffidae families. Galloanserae (ducks, fowl) and Neoaves (most other modern birds) are superorders within the class Aves (birds). (B,C) Dendrograms of 437 viruses with known reservoir hosts and 98 viruses with known arthropod vectors, estimated by hierarchically clustering 4229 genomic biases calculated from viral genomes. Colors of tip symbols indicate reservoir or vectors associations. Branch colors show viral taxonomic groups. Branch lengths are log(n+1) transformed for visualization. (B) Trait models with true viral taxonomic group associations were favored over those with randomly shuffled viral groups (ΔAIC = -1690.6) but also clustered significantly by reservoir (ΔAIC = -540.7). (C) Arboviruses clustered by both viral taxonomy (ΔAIC = -238.1) and vector group (ΔAIC = -61.5). ΔAIC values are from models comparing true associations to the mean AIC from 500 tip trait randomizations.
Fig. 2
Fig. 2. Accurate genomic prediction of viral ecology using machine learning.
(A) Heatmap showing the proportion of accurate (diagonal) and misclassified (off diagonal) predictions within each reservoir host class, averaged across GBMs trained and optimized on different subsets of 372 viruses. Row numbers indicate the number of viruses per reservoir in each validation set (N = 65 viruses). (B) The distributions of per reservoir accuracies in single validation sets (colorful points and lines are median and SD) and after bagging (white points). Black points show the best single model. (C) Cumulative bagged accuracy across GBMs using PN and SelGen traits in isolation and in combination. The x-axis shows the rank of the true reservoir (i.e., 1 = true reservoir was the top prediction; 2 = true reservoir was the second-ranked prediction and so on). The y-axis shows accuracy when considering increasing numbers of predictions as plausible. The asterisk indicates significantly higher accuracy in the combined model (χ2 test: p < 0.05). Cumulative null model accuracy was estimated by training GBMs on 50 randomly generated traits that were simulated from normal distributions ranging from 0 to 2 and randomly assigned to viruses. (D,E) Heatmaps showing the average proportion of accurate predictions of arthropod-borne status and vector identity (N = 80 and 46 viruses per validation set, respectively). (F) Distributions of per vector accuracies as in B. (G) Cumulative bagged accuracy in vector prediction across models as in C.
Fig. 3
Fig. 3. Reservoir hosts and arthropod vectors of orphan viruses predicted from their genome sequences.
(A) Predicted reservoirs for 36 viruses that emerged from unknown sources. (B) 31 viruses discovered by active surveillance of wildlife or blood-feeding arthropods. (C) Predictions of arthropod-borne status for 17 viruses (left of dashed line) and vector identities (last 4 columns, when applicable). Color gradients show the BPS for each class from the top 25% models from each set of GBMs. Figs. S14–S16 show the full probability distributions of predictions.

Comment in

  • Sources of human viruses.
    Woolhouse M. Woolhouse M. Science. 2018 Nov 2;362(6414):524-525. doi: 10.1126/science.aav4265. Science. 2018. PMID: 30385562 No abstract available.

References

    1. Viana M, et al. Assembling evidence for identifying reservoirs of infection. Trends Ecol Evol. 2014;29:270–279. - PMC - PubMed
    1. Woolhouse M, Gaunt E. Ecological origins of novel human pathogens. Crit Rev Microbiol. 2007;33:231–242. - PubMed
    1. Kao RR, Haydon DT, Lycett SJ, Murcia PR. Supersize me: How whole-genome sequencing and big data are transforming epidemiology. Trends Microbiol. 2014;22:282–291. - PMC - PubMed
    1. Olival KJ, et al. Host and viral traits predict zoonotic spillover from mammals. Nature. 2017;546:646–650. - PMC - PubMed
    1. Geoghegan JL, Duchêne S, Holmes EC. Comparative analysis estimates the relative frequencies of co-divergence and cross-species transmission within viral families. PLoS Pathog. 2017;13:e1006215. - PMC - PubMed

Publication types