Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Apr 20;11(8):e3999.
doi: 10.21769/BioProtoc.3999.

Computational Analysis and Phylogenetic Clustering of SARS-CoV-2 Genomes

Affiliations

Computational Analysis and Phylogenetic Clustering of SARS-CoV-2 Genomes

Bani Jolly et al. Bio Protoc. .

Abstract

COVID-19, the disease caused by the novel SARS-CoV-2 coronavirus, originated as an isolated outbreak in the Hubei province of China but soon created a global pandemic and is now a major threat to healthcare systems worldwide. Following the rapid human-to-human transmission of the infection, institutes around the world have made efforts to generate genome sequence data for the virus. With thousands of genome sequences for SARS-CoV-2 now available in the public domain, it is possible to analyze the sequences and gain a deeper understanding of the disease, its origin, and its epidemiology. Phylogenetic analysis is a potentially powerful tool for tracking the transmission pattern of the virus with a view to aiding identification of potential interventions. Toward this goal, we have created a comprehensive protocol for the analysis and phylogenetic clustering of SARS-CoV-2 genomes using Nextstrain, a powerful open-source tool for the real-time interactive visualization of genome sequencing data. Approaches to focus the phylogenetic clustering analysis on a particular region of interest are detailed in this protocol.

Keywords: COVID-19; Coronavirus; Genomes; Phylogenetic analysis; SARS-CoV-2.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.. The different steps described in this protocol and the Augur modules used in each of the analysis steps
Figure 2.
Figure 2.. Sample record for the hCoV-19/India/1-27/2020 SARS-CoV2 strain in the sequences.fasta format
Figure 3.
Figure 3.. Summary screenshot of the clades.tsv file provided by Nextstrain for SARS-CoV-2 genomes
Figure 4.
Figure 4.. Summary screenshot of the lat_longs.tsv file required by Nextstrain for visualizing geographic traits
Figure 5.
Figure 5.. Summary screenshot of the colors.tsv file created for visualizing sequence quality
Figure 6.
Figure 6.. Screenshot of the visualization produced by Nextstrain for the COVID_global and COVID_india datasets

References

    1. Angeletti S., Lo Presti A., Giovanetti M., Grifoni A., Amicosante M., Ciotti M., Alcantara L. J., Cella E. and Ciccozzi M.(2016). Phylogenesys and homology modeling in Zika virus epidemic: food for thought. Pathog Glob Health 110(7-8): 269-274. - PMC - PubMed
    1. Babakir-Mina M., Ciccozzi M., Ciotti M., Marcuccilli F., Balestra E., Dimonte S., Perno C. F. and Aquaro S.(2009). Phylogenetic analysis of the surface proteins of influenza A(H5N1) viruses isolated in Asian and African populations. New Microbiol 32(4): 397-403. - PubMed
    1. Hadfield J., Megill C., Bell S. M., Huddleston J., Potter B., Callender C., Sagulenko P., Bedford T. and Neher R. A.(2018). Nextstrain: real-time tracking of pathogen evolution. Bioinformatics 34(23): 4121-4123. - PMC - PubMed
    1. Huang C., Wang Y., Li X., Ren L., Zhao J., Hu Y., Zhang L., Fan G., Xu J., Gu X. et al. (2020). Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet 395(10223): 497-506. - PMC - PubMed
    1. Katoh K. and Standley D. M.(2013). MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30(4): 772-780. - PMC - PubMed