Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Dec 28;13(1):63.
doi: 10.3390/biom13010063.

A Computer Simulation of SARS-CoV-2 Mutation Spectra for Empirical Data Characterization and Analysis

Affiliations

A Computer Simulation of SARS-CoV-2 Mutation Spectra for Empirical Data Characterization and Analysis

Ming Xiao et al. Biomolecules. .

Abstract

It is very important to compute the mutation spectra, and simulate the intra-host mutation processes by sequencing data, which is not only for the understanding of SARS-CoV-2 genetic mechanism, but also for epidemic prediction, vaccine, and drug design. However, the current intra-host mutation analysis algorithms are not only inaccurate, but also the simulation methods are unable to quickly and precisely predict new SARS-CoV-2 variants generated from the accumulation of mutations. Therefore, this study proposes a novel accurate strand-specific SARS-CoV-2 intra-host mutation spectra computation method, develops an efficient and fast SARS-CoV-2 intra-host mutation simulation method based on mutation spectra, and establishes an online analysis and visualization platform. Our main results include: (1) There is a significant variability in the SARS-CoV-2 intra-host mutation spectra across different lineages, with the major mutations from G- > A, G- > C, G- > U on the positive-sense strand and C- > U, C- > G, C- > A on the negative-sense strand; (2) our mutation simulation reveals the simulation sequence starts to deviate from the base content percentage of Alpha-CoV/Delta-CoV after approximately 620 mutation steps; (3) 2019-NCSS provides an easy-to-use and visualized online platform for SARS-Cov-2 online analysis and mutation simulation.

Keywords: SARS-CoV-2; bioinformatics; computational biology; mutation simulation; mutation spectra.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Figures

Figure 1
Figure 1
The workflow of the study.
Figure 2
Figure 2
The pseudo code of nucleobase filtering algorithm with dynamic threshold. Here, D represents the dictionary which stores quality_scores:base_numbers pairs at each site; V represents the base dataset after filtering by the algorithm at each site; Q represents the base quality score corresponding to left or right; N represents the base number corresponding to Qleft or Qright. We implement the algorithm for the mapped sequencing reads at each site of the reference genome to obtain the high-quality sequence data for mutation spectra analysis.
Figure 3
Figure 3
The mutation simulation flowchart of SARS-CoV-2.
Figure 4
Figure 4
Comparing the low-quality data filtering results of five approaches by average base quality within each window length (300 bases). (A) The specific average base quality of SARS-CoV-2 positive-sense reference strand at each site. (B) The overall average base quality distribution of SARS-CoV-2 positive-sense reference strand at each site. (C) The specific average base quality of SARS-CoV-2 negative-sense reference strand at each site. (D) The overall average base quality distribution of SARS-CoV-2 negative-sense reference strand at each site. *** p ≤ 0.001.
Figure 5
Figure 5
Intra-host mutation spectra of SARS-CoV-2. The size of each color in the pie chart indicates the proportion of the corresponding base mutation type within the intra-host mutation spectra.
Figure 6
Figure 6
Variation of base content percentage in mutation simulation. (A) The percentage of four base content during mutation simulation. The horizontal axis represents the cumulative mutation number and the vertical axis represents the base percentage. Red, blue, dark green and orange lines represent the content percentage of A, U, C and G, respectively. (B) The percentage of “AG” and “AU” content during the mutation simulation. Brown dotted lines represent how many mutation steps that the percentage of “AG” and “AU” content would deviate from the base content percentage of Gamma-CoV, Alpha-CoV/Delta-CoV and Beta-CoV, respectively. Purple and light green lines represent the percentage of “AG” and “AU” content, respectively.
Figure 7
Figure 7
The distribution of new-generated stop codons during mutation simulation. The horizontal axis represents the locations of stop codons on SARS-CoV-2 gene segments and the vertical axis represents the average number of these stop codons within the window length (300 bases). Pink, orange, blue and purple lines represent the maximum number of stop codons during mutation (Method S3), the number of stop codons when base content percentage start to deviate from Gamma-CoV, Alpha-CoV /Delta-CoV and Beta-CoV, respectively.
Figure 8
Figure 8
Implementation of sequence periodicity analysis using power spectrum. (A) Overall power spectrum of all kinds of stop codons (UAA, UAG and UGA); (B) Overall power spectrum of the stop codon UAA; (C) Overall power spectrum of the stop codon UAG; (D) Overall power spectrum of the stop codon UGA. Since 3 nt frequency peaks in the overall power spectrum (represent stop codons, (AD) are obvious, we zoom in each figure to investigate other periodic patterns except for 3 nt by (EH). The horizontal axis (log10f)) represents the length of periodic sequences (f) and the vertical axis represents the power density of corresponding spectrum. Purple, green, and yellow lines of each stop codon represent the power spectrum when the maximum number of stop codons during mutation (Method S3) starts to deviate from the base content percentage of Alpha-CoV/Delta-CoV, Beta-CoV and Gamma-CoV, respectively.
Figure 9
Figure 9
The webpage of 2019-NCSS. (A) The “mutation spectra analysis” module. (B) The “mutation simulation” module.

References

    1. Gorbalenya A.E., Baker S.C., Baric R.S., de Groot R.J., Drosten C., Gulyaeva A.A., Haagmans B.L., Lauber C., Leontovich A.M., Neuman B.W., et al. The species Severe acute respiratory syndrome-related coronavirus: Classifying 2019-nCoV and naming it SARS-CoV-2. Nat. Microbiol. 2020;5:536–544. doi: 10.1038/s41564-020-0695-z. - DOI - PMC - PubMed
    1. Day T., Gandon S., Lion S., Otto S.P. On the evolutionary epidemiology of SARS-CoV-2. Curr. Biol. 2020;30:R849–R857. doi: 10.1016/j.cub.2020.06.031. - DOI - PMC - PubMed
    1. Liu Y., Kearney J., Mahmoud M., Kille B., Sedlazeck F.J., Treangen T.J. Rescuing low frequency variants within intra-host viral populations directly from Oxford Nanopore sequencing data. Nat. Commun. 2022;13:1321. doi: 10.1038/s41467-022-28852-1. - DOI - PMC - PubMed
    1. Zhang L., Dai Z., Yu J., Xiao M. CpG-island-based annotation and analysis of human housekeeping genes. Brief. Bioinform. 2021;22:515–525. doi: 10.1093/bib/bbz134. - DOI - PubMed
    1. Peck K.M., Lauring A.S. Complexities of Viral Mutation Rates. J. Virol. 2018;92:e01031-17. doi: 10.1128/JVI.01031-17. - DOI - PMC - PubMed

Publication types

Supplementary concepts