Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Feb 15;4(1):228.
doi: 10.1038/s42003-021-01754-6.

Analysis of SARS-CoV-2 mutations in the United States suggests presence of four substrains and novel variants

Affiliations

Analysis of SARS-CoV-2 mutations in the United States suggests presence of four substrains and novel variants

Rui Wang et al. Commun Biol. .

Erratum in

Abstract

SARS-CoV-2 has been mutating since it was first sequenced in early January 2020. Here, we analyze 45,494 complete SARS-CoV-2 geneome sequences in the world to understand their mutations. Among them, 12,754 sequences are from the United States. Our analysis suggests the presence of four substrains and eleven top mutations in the United States. These eleven top mutations belong to 3 disconnected groups. The first and second groups consisting of 5 and 8 concurrent mutations are prevailing, while the other group with three concurrent mutations gradually fades out. Moreover, we reveal that female immune systems are more active than those of males in responding to SARS-CoV-2 infections. One of the top mutations, 27964C > T-(S24L) on ORF8, has an unusually strong gender dependence. Based on the analysis of all mutations on the spike protein, we uncover that two of four SASR-CoV-2 substrains in the United States become potentially more infectious.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Pie chart plot of four clusters in the United States as of 11 September 2020.
The blue, red, yellow, and green colors represent clusters A, B, C, and D, respectively. The base color of each state is decided by its dominant cluster. No color is assigned to a state if we cannot find its complete genome sequences at the GISAID database.
Fig. 2
Fig. 2. The evolution and the gender distribution of the top 11 missense mutation ratios.
The blue lines illustrate the evolution of the top 11 missense mutation ratios (the y-axis on the left) computed as the number of genome sequences having a given mutation over the total number of genome sequences. The red lines represent the evolution of the total number of genome sequences (the y-axis on the right). The bar plot is the gender distribution of the ratio of the number of samples having top 11 missense mutations over the total number of samples having age and/or gender labels. Red bars represent the female ratios and the blue bars represent the male ratios in the United States.
Fig. 3
Fig. 3. The 3D structure and network analysis of SARS-CoV-2 NSP12 protein.
a The 3D structure of SARS-CoV-2 NSP12 protein. The mutated residue is marked with color balls. b The differences of FRI rigidity index between the network with wild type and the network with mutant type. c The difference of the subgraph centrality between the network with wild type and the network with mutant type.
Fig. 4
Fig. 4. The 3D structure and network analysis plot of SARS-CoV-2 S protein.
a Illustration of S-protein and ACE2 interaction. The RBD is displayed in green, the ACE2 is given in red, and mutation D614G is highlighted in red. b The difference of FRI rigidity index between the network with wild type and the network with mutant type. c The difference of the subgraph centrality between the network with wild type and the network with mutant type.
Fig. 5
Fig. 5. The 3D structure and network analysis plot of SARS-CoV-2 ORF3a protein.
a The 3D structure of SARS-CoV-2 ORF3a protein. b The visualization of SARS-CoV-2 ORF3a proteoform. The high-frequency mutation 25563G>T-(Q57H) on ORF3a is marked in color. The red color represents the wild type and the yellow represents the wild type. c The difference of FRI rigidity index between the network with wild type and the network with mutant type. d The difference of the subgraph centrality between the network with wild type and the network with mutant type. e Sequence alignments for the ORF3a protein of SARS-CoV-2, SARS-CoV, bat coronavirus RaTG13, bat coronavirus CoVZC45, and bat coronavirus BM48-31. Detailed numbering is given according to SARS-CoV-2. One high-frequency mutation 25563G>T-(Q57H) locates on the ORF3a protein. Here, the red rectangle marks the Q57H position with its neighborhoods.
Fig. 6
Fig. 6. The time evolution of 264 SARS-CoV-2 S protein mutations.
The red lines represent the RBD mutations that strengthen the infectivity of SARS-CoV-2 (i.e., ΔΔG is positive), the blue lines represent the RBD mutations that weaken the infectivity of SARS-CoV-2 (i.e., ΔΔG is negative), and the green lines are for S protein mutations that away from the RBD. The mutation with the highest frequency is D614G.
Fig. 7
Fig. 7. Overall binding free energy changes ΔΔG (kcal/mol) on the receptor-binding domain (RBD).
The blue color region marks the binding free energy changes on the receptor-binding motif (RBM). The height of each bar indicates the predicted ΔΔG. The color indicates the occurrence frequency in the GISAID genome dataset.
Fig. 8
Fig. 8. Binding free energy changes ΔΔG (kcal/mol) induced by mutations (figure on the left) and mutations on the SARS-CoV-2 S protein RBD (figure on the right) in four clusters.
a Cluster A, b Cluster B, c Cluster C, and d Cluster D.

Update of

Similar articles

Cited by

References

    1. WHO. Coronavirus disease 2019 (COVID-19) situation report - 172. Coronavirus Disease (COVID-2019) Situation Reports (2020).
    1. Shu, Y. & McCauley, J. GISAID: Global initiative on sharing all influenza data–from vision to reality. Eurosurveillance22, 30494 (2017). - PMC - PubMed
    1. Sevajol M, Subissi L, Decroly E, Canard B, Imbert I. Insights into RNA synthesis, capping, and proofreading mechanisms of SARS-coronavirus. Virus Res. 2014;194:90–99. - PMC - PubMed
    1. Ferron F, et al. Structural and molecular basis of mismatch correction and ribavirin excision from coronavirus RNA. Proc. Natl Acad. Sci. USA. 2018;115:E162–E171. - PMC - PubMed
    1. Wu F, et al. A new coronavirus associated with human respiratory disease in China. Nature. 2020;579:265–269. - PMC - PubMed

Publication types

MeSH terms

Substances