Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Feb 19;24(2):102116.
doi: 10.1016/j.isci.2021.102116. Epub 2021 Jan 28.

Mutational signatures and heterogeneous host response revealed via large-scale characterization of SARS-CoV-2 genomic diversity

Affiliations

Mutational signatures and heterogeneous host response revealed via large-scale characterization of SARS-CoV-2 genomic diversity

Alex Graudenzi et al. iScience. .

Abstract

To dissect the mechanisms underlying the inflation of variants in the Severe Acute Respiratory Syndrome CoronaVirus 2 (SARS-CoV-2) genome, we present a large-scale analysis of intra-host genomic diversity, which reveals that most samples exhibit heterogeneous genomic architectures, due to the interplay between host-related mutational processes and transmission dynamics. The decomposition of minor variants profiles unveils three non-overlapping mutational signatures related to nucleotide substitutions and likely ruled by APOlipoprotein B Editing Complex (APOBEC), Reactive Oxygen Species (ROS), and Adenosine Deaminase Acting on RNA (ADAR), highlighting heterogeneous host responses to SARS-CoV-2 infections. A corrected-for-signatures dN/dS analysis demonstrates that such mutational processes are affected by purifying selection, with important exceptions. In fact, several mutations appear to transit toward clonality, defining new clonal genotypes that increase the overall genomic diversity. Furthermore, the phylogenomic analysis shows the presence of homoplasies and supports the hypothesis of transmission of minor variants. This study paves the way for the integrated analysis of intra-host genomic diversity and clinical outcomes of SARS-CoV-2 infections.

Keywords: Bioinformatics; Genetics; Phylogenetics.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

None
Graphical abstract
Figure 1
Figure 1
Mutational landscape of 1133 SARS-CoV-2 samples – data set #1 (NCBI BioProject: PRJNA645906) (A) Scatter plot displaying the number of clonal (VF >90%) and minor (VF >5% and 90% ) variants for 1133 samples of data set #1 (node size proportional to the number of samples). (B) Box plots returning the distribution of the number of clonal and (C) minor variants, obtained by grouping samples according to collection date (weeks, 2020; Mann-Kendall trend test p value also shown). n returns the number of samples in each group. (D) Bar plot returning the proportion of sites of the SARS-CoV-2 genome that are either non-mutated, mutated with a unique SNV, or mutated with multiple SNVs. (E) Stacked bar plots returning the proportion of SNVs detected as always clonal, mixed, or always minor. (F) The ratio of synonymous (S), non-synonymous (NS), and non-coding (NC) mutations, for each category. (G) Violin plots returning the distribution of VF of all SNVs (n returns the number of samples, k the number of distinct SNVs, m the number of non-zero entries of the VF matrix). (H) Graphical representation of an example data set.
Figure 2
Figure 2
Characterization of SNVs detected on the SARS-CoV-2 genome (A) Scatter plot returning the genome location and the VF of all SNVs detected in the data set, colored according to category. (B) Stacked bar plots returning the normalized substitution proportion of all SNVs detected in at least one sample of the data set, with respect to all 12 possible nucleotide substitutions, grouped by variant type.
Figure 3
Figure 3
Mutational signatures of SARS-CoV-2 (A) The nucleotide class distribution in SARS-CoV-2-ANC reference genome (Ramazzotti et al., 2021) and for the 3 SARS-CoV-2 mutational signatures retrieved via NMF on 6 substitution classes is shown. (B) Heatmap returning the clustering of 150 samples with 6 always minor variants (13% of the data set), computed via k-means on the low-rank latent NMF matrix. The goodness of fit in terms of median cosine similarity between observations and predictions and the harmonic mean p value of the one-sided Mann-Whitney U test on bootstrap re-sampling, are shown for all signatures, see Methods. (C) Pie chart returning the proportion of samples in the three signature-based clusters, plus a fourth cluster SC#4 including all samples with 1 and <6 always minor variants and the group of samples with 0 always minor SNVs. (D-E) Categorical normalized cumulative VF distribution of all SNVs detected in each signature-based cluster, with respect to (D) 6 substitution classes and to (E) 96 trinucleotide contexts, as compared to the theoretical distribution in SARS-CoV-2-ANC reference genome (left).
Figure 4
Figure 4
Characterization of signature-based clusters of SARS-CoV-2 samples (A) Distribution of the number of clonal variants with respect to the 4 signature-based clusters described in the text. (B) Distribution of the number of minor variants for the 4 signature-based clusters. (C) Violin plots returning the VF distribution with respect to signature-based clusters (n returns the number of samples, k the number of distinct SNVs, m the number of non-zero entries of the VF matrix). (D) (Average) proportion of substitution classes of always minor variants for all the samples included in the 4 signature-based clusters, grouped and sorted by the number of minor SNVs (e.g., at position 10 of the x axis one can find the average proportion of substitution classes for all samples with 10 minor SNVs). (E) Corrected-for-signatures dN/dS ratio plot, as computed by normalizing the ratio on cluster substitution distribution, on a 300-base sliding window, with respect to signature-based clusters (see Methods). The superimposed dotted line returns the mutational density in each window (rightmost y axis).
Figure 5
Figure 5
Phylogenomic model of 1133SARS-CoV-2 samples of via VERSO – data set #1 (NCBI BioProject: PRJNA645906) (A)The phylogenetic tree returned by VERSO (Ramazzotti et al., 2021) considering 28 clonal variants (VF >0.90) detected in at least 3% of the 1133 samples of the data set is displayed. Colors mark the 23 distinct clades identified by VERSO, which are associated to corrected clonal genotypes. Genotype labels are consistent with (Ramazzotti et al., 2021), whereas in Supplementary File S3, one can find the mapping with the lineage nomenclature proposed in Rambaut et al. (2020). Samples with identical corrected clonal genotypes are grouped in polytomies (visualization via FigTree (Rambaut, 2009)). The black colored sample represents the SARS-CoV-2-ANC reference genome. (B) Heatmap returning the composition of the 23 corrected clonal genotypes returned by VERSO. Clonal SNVs are annotated with mapping on ORFs, synonymous (S), nonsynonymous (NS) and non-coding (NC) states, and related amino acid substitutions. Variants g.8782T>C (ORF1ab, synonymous) and g.28144C>T (ORF8, p.84S>L) are colored in blue, variant g.23403 A>G (S, p.614 D>G) in red, homoplastic variant g.11083G>T (ORF1ab, p.3606L>F) in green. (C) Heatmaps displaying the count of minor variants with respect to the number of clades and samples in which they are found, grouped by signature-based cluster (e.g., at row 3 and column 5, the color represents the number of SNVs found in 3 clades and 5 samples). (D) Violin plots returning the VF distribution of all minor variants, with respect to the number of clades in which they are found (the first violin plot is associated to variants privately detected in single samples). n returns the number of samples, k the number of distinct SNVs, m the number of non-zero entries of the VF matrix. (E) Pie chart returning the proportion of minor variants privately detected in single samples, detected in multiple samples of the same clade, and in multiple samples of independent clades.
Figure 6
Figure 6
Validation on data sets #25 (NCBI BioProject: PRJNA636748, PRJNA633948, PRJNA647529, and PRJNA625551) Heatmap returning the clustering of 141, 23, 17, and 14 samples of data sets #25 with 6 always minor variants (of the data set), computed via k-means on the low-rank latent NMF matrix on the three signatures discovered on data set 13% (see Methods). The goodness of fit in terms of median cosine similarity between observations and predictions and the harmonic mean p value of the one-sided Mann-Whitney U test on bootstrap re-sampling are shown for all signatures (see Methods).

References

    1. Alexandrov L.B., Nik-Zainal S., Wedge D.C., Aparicio S.A., Behjati S., Biankin A.V., Bignell G.R., Bolli N., Borg A., Børresen-Dale A.L. Signatures of mutational processes in human cancer. Nature. 2013;500:415–421. - PMC - PubMed
    1. Alexandrov L.B., Kim J., Haradhvala N.J., Huang M.N., Ng A.W.T., Wu Y., Boot A., Covington K.R., Gordenin D.A., Bergstrom E.N. The repertoire of mutational signatures in human cancer. Nature. 2020;578:94–101. - PMC - PubMed
    1. Andersen K.G., Rambaut A., Lipkin W.I., Holmes E.C., Garry R.F. The proximal origin of SARS-CoV-2. Nat. Med. 2020;26:450–452. - PMC - PubMed
    1. Bandelt H.J., Quintana-Murci L., Salas A., Macaulay V. The fingerprint of phantom mutations in mitochondrial DNA data. Am. J. Hum. Genet. 2002;71:1150–1160. - PMC - PubMed
    1. Barik S., Das S., Vikalo H. QSdpR: viral quasispecies reconstruction via correlation clustering. Genomics. 2018;110:375–381. - PubMed