Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jun 9;20(6):e0323970.
doi: 10.1371/journal.pone.0323970. eCollection 2025.

Variant evolution graph: Can we infer how SARS-CoV-2 variants are evolving?

Affiliations

Variant evolution graph: Can we infer how SARS-CoV-2 variants are evolving?

Badhan Das et al. PLoS One. .

Abstract

The SARS-CoV-2 virus has undergone extensive mutations over time, resulting in considerable genetic diversity among circulating strains. This diversity directly affects important viral characteristics, such as transmissibility and disease severity. During a viral outbreak, the rapid mutation rate produces a large cloud of variants, referred to as a viral quasispecies. However, many variants are lost due to the bottleneck of transmission and survival. Advances in next-generation sequencing have enabled continuous and cost-effective monitoring of viral genomes, but constructing reliable phylogenetic trees from the vast collection of sequences in GISAID (the Global Initiative on Sharing All Influenza Data) presents significant challenges. We introduce a novel graph-based framework inspired by quasispecies theory, the Variant Evolution Graph (VEG), to model viral evolution. Unlike traditional phylogenetic trees, VEG accommodates multiple ancestors for each variant and maps all possible evolutionary pathways. The strongly connected subgraphs in the VEG reveal critical evolutionary patterns, including recombination events, mutation hotspots, and intra-host viral evolution, providing deeper insights into viral adaptation and spread. We also derive the Disease Transmission Network (DTN) from the VEG, which supports the inference of transmission pathways and super-spreaders among hosts. We have applied our method to genomic data sets from five arbitrarily selected countries - Somalia, Bhutan, Hungary, Iran, and Nepal. Our study compares three methods for computing mutational distances to build the VEG, sourmash, pyani, and edit distance, with the phylogenetic approach using Maximum Likelihood (ML). Among these, ML is the most computationally intensive, requiring multiple sequence alignment and probabilistic inference, making it the slowest. In contrast, sourmash is the fastest, followed by the edit distance approach, while pyani takes more time due to its BLAST-based computations. This comparison highlights the computational efficiency of VEG, making it a scalable alternative for analyzing large viral data sets.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Pipeline for generating edit-distance matrix using pairwise edit distances between the variant genomes.
Fig 2
Fig 2. The whole pipeline of building VEG.
The edit distance computation in this pipeline is separately shown in Fig 1.
Fig 3
Fig 3. A sample set of six genomes.
(a) The set of six variant genomes, A, B, C, D, E, and F, and their corresponding collection dates. (b) The workflow of Algorithm 1 on the distance matrix, M.
Fig 4
Fig 4. The workflow of Algorithm 2 with the example in
Fig 3.
Fig 5
Fig 5. Count of genomes filtered based on the percentage of Ns, τ, in the genomes.
The x-axis shows the threshold values and the y-axis shows the count of filtered genomes. The plots are of (a) Somalia, (b) Bhutan, (c) Iran, and (d) Nepal data sets.
Fig 6
Fig 6. Average count of Ns in the genome sequences vs the coding regions in five data sets.
Fig 7
Fig 7. (a) VEGS (b) VEGE, and (c) VEGP of Bhutan data set (graph viewed using Cytoscape 3.10.2).
Fig 8
Fig 8. A maximum likelihood phylogenetic tree of the Bhutan data set.
Fig 9
Fig 9. Venn diagrams showing parent-child relationships among the VEGs derived from sourmash, pyani, and edit distance.
(a) Bhutan, (b) Hungary, (c) Nepal, and (d) Iran data sets.
Fig 10
Fig 10. The DTN is inferred from the VEG of the Bhutan data set (edit distance).
Here, the nodes are the hosts, and the edges represent the direction and day differences of the inferred transmissions.

References

    1. Markov PV, Ghafari M, Beer M, Lythgoe K, Simmonds P, Stilianakis NI, et al. The evolution of SARS-CoV-2. Nat Rev Microbiol. 2023;21(6):361–79. doi: 10.1038/s41579-023-00878-2 - DOI - PubMed
    1. Khare S, Gurry C, Freitas L, Schultz MB, Bach G, Diallo A, et al. GISAID’s role in pandemic response. China CDC Wkly. 2021;3(49):1049–51. doi: 10.46234/ccdcw2021.255 - DOI - PMC - PubMed
    1. Chen H. Determining mutant spectra of three RNA viral samples using ultra-deep sequencing. Lawrence Livermore National Lab (LLNL). 2012. https://www.osti.gov/biblio/1044235
    1. Mandary MB, Masomian M, Poh CL. Impact of RNA virus evolution on quasispecies formation and virulence. Int J Mol Sci. 2019;20(18):4657. doi: 10.3390/ijms20184657 - DOI - PMC - PubMed
    1. Ogando NS, Zevenhoven-Dobbe JC, van der Meer Y, Bredenbeek PJ, Posthuma CC, Snijder EJ. The enzymatic activity of the nsp14 exoribonuclease is critical for replication of MERS-CoV and SARS-CoV-2. J Virol. 2020;94(23):e01246-20. doi: 10.1128/JVI.01246-20 - DOI - PMC - PubMed

Supplementary concepts