Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Mar 17;12(1):4576.
doi: 10.1038/s41598-022-08350-6.

SARS-CoV-2 host prediction based on virus-host genetic features

Affiliations

SARS-CoV-2 host prediction based on virus-host genetic features

Irina Yuri Kawashima et al. Sci Rep. .

Abstract

The genetic diversity of the Coronaviruses gives them different biological abilities, such as infect different cells and/or organisms, a wide spectrum of clinical manifestations, their different routes of dispersion, and viral transmission in a specific host. In recent decades, different Coronaviruses have emerged that are highly adapted for humans and causing serious diseases, leaving their host of unknown origin. The viral genome information is particularly important to enable the recognition of patterns linked to their biological characteristics, such as the specificity in the host-parasite relationship. Here, based on a previously computational tool, the Seq2Hosts, we developed a novel approach which uses new variables obtained from the frequency of spike-Coronaviruses codons, the Relative Synonymous Codon Usage (RSCU) to shed new light on the molecular mechanisms involved in the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) host specificity. By using the RSCU obtained from nucleotide sequences before the SARS-CoV-2 pandemic, we assessed the possibility of know the hosts capable to be infected by these new emerging species, which was first identified infecting humans during 2019 in Wuhan, China. According to the model trained and validated using sequences available before the pandemic, bats are the most likely the natural host to the SARS-CoV-2 infection, as previously suggested in other studies that searched for the host viral origin.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Characterization of the training dataset (Dataset 1). (A) Phylogenetic characterization estimated based on maximum likelihood, and showing all the Coronaviruses genus. The tree was generated with IQ-TREE 1.5.5 available at http://www.iqtree.org and visualized with FigTree 1.4.4 available at http://tree.bio.ed.ac.uk/software/figtree/; (B) Two Dimensional PCA reduction with prototypes according to the different Coronaviruses genus; and (C) Two Dimensional PCA reduction with prototypes according to the different primary host. The different colours in (B) and (C) represents each group of genus or host class using the training data set (Dataset 1). For some hosts or even between the genus, we can observe some clouds of points concentrated, while in other conditions, as in bats, the samples are scattered in different positions of the graph.
Figure 2
Figure 2
Classifier performance. (A) Cumulative explained variance using distinct number of principal components; and (B) Accuracy to each host using different number of principal components. From 20 PCs onwards there is no longer significant increase in accuracy. The Confusion Matrix used to obtain this diagram can be found in Supplementary Table S2.
Figure 3
Figure 3
PCA reduction, all datasets. Dataset-1, training dataset; Dataset-2, testing; Dataset-3, SARS-CoV-2; Dataset-4, Bat Coronavirus, HCoV and Pangolin Coronavirus. Despite the Dataset-4 sequences being phylogenetically close to the Dataset-3 SARS-CoV-2 sequences, we can notice that all of them do not cluster together when using RSCU as feature. Both Dataset-3 and 4 sequences were classified by our model as closer to bat coronaviruses than human coronaviruses.

References

    1. Zhou P, et al. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature. 2012 doi: 10.1038/s41586-020-2012-7. - DOI - PMC - PubMed
    1. Zhu N, et al. A novel coronavirus from patients with pneumonia in china, 2019. N. Engl. J. Med. 2020;382:727–733. doi: 10.1056/nejmoa2001017. - DOI - PMC - PubMed
    1. Li X, et al. Transmission dynamics and evolutionary history of 2019-nCoV. J. Med. Virol. 2020;92:501–511. doi: 10.1002/jmv.25701. - DOI - PMC - PubMed
    1. Wang LF, et al. Review of bats and SARS. Emerg. Infect. Dis. 2006;12:1834–1840. doi: 10.3201/eid1212.060401. - DOI - PMC - PubMed
    1. Wrobel AG, et al. SARS-CoV-2 and bat RaTG13 spike glycoprotein structures inform on virus evolution and furin-cleavage effects. Nat. Struct. Mol. Biol. 2020;27:763–767. doi: 10.1038/s41594-020-0468-7. - DOI - PMC - PubMed

Publication types