Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 20;99(5):e0225024.
doi: 10.1128/jvi.02250-24. Epub 2025 Apr 24.

SARS-CoV-2 biological clones are genetically heterogeneous and include clade-discordant residues

Affiliations

SARS-CoV-2 biological clones are genetically heterogeneous and include clade-discordant residues

Ana Isabel de Ávila et al. J Virol. .

Abstract

Defective genomes are part of SARS-CoV-2 quasispecies. High-resolution, ultra-deep sequencing of bulk RNA from viral populations does not distinguish RNA mutations, insertions, and deletions in viable genomes from those in defective genomes. To quantify SARS-CoV-2 infectious variant progeny, virus from four individual plaques (biological clones) of a preparation of isolate USA-WA1/2020, formed on Vero E6 cell monolayers, was subjected to further biological cloning to yield 9 second-generation and 15 third-generation sub-clones. Consensus genomic sequences of the biological clones and sub-clones included an average of 2.8 variations per viable genome, relative to the consensus sequence of the parental USA-WA1/2020 virus. This value is 6.5-fold lower than the estimates for biological clones of other RNA viruses such as bacteriophage Qβ, foot-and-mouth disease virus, or hepatitis C virus in cell culture. The mutant spectrum complexity of the nsp12 (polymerase)- and spike (S)-coding region was unique in the progeny of each of 10 third-generation sub-clones; they shared 2.4% of the total of 164 different mutations and deletions scored in the 3,719 genomic residues that were screened. The presence of minority out-of-frame deletions revealed the ease of defective genome production from an individual infectious genome. Several low-frequency point mutations and deletions were clade-discordant in that they were not typical of USA-WA1/2020 but served to define the consensus sequences of future SARS-CoV-2 clades. Implications for SARS-CoV-2 adaptability and COVID-19 control of the viable genome heterogeneity and the generation of complex mutant spectra from individual genomes are discussed.IMPORTANCESequencing of biological clones is a means to identify mutations, insertions, and deletions located in viable genomes. This distinction is particularly important for viral populations, such as those of SARS-CoV-2, that contain large proportions of defective genomes. By sequencing biological clones and sub-clones, we quantified the heterogeneity of the viable complement of USA-WA1/2020 to be lower than exhibited by other RNA viruses. This difference may be due to a reduced mutation rate or to limited tolerance of the large coronavirus genome to incorporate mutations and deletions and remain functional or a combination of both influences. The presence of clade-discordant residues in the progeny of individual biological sub-clones suggests limitations in the occupation of sequence space by SARS-CoV-2. However, the complex and unique mutant spectra that are rapidly generated from individual genomes suggest an aptness to confront selective constraints.

Keywords: RNA virus; defective viral genome; deletion; diversity indices; mutant spectrum; point mutation; population complexity; quasispecies; ultra-deep sequencing; viral clade.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig 1
Fig 1
Schematic representation of the isolation of biological clones and sub-clones from SARS-CoV-2 USA-WA1/2020. The initial virus (circle on the left) was prepared by infection of Vero E6 cells of the virus received from BEI Resources. It was treated with deoxycholate (DC), diluted, and plated on Vero E6 cell monolayers to obtain biological clones c10, c11, c12, and c13 (filled squares). The process was repeated to obtain 9 second-generation and 15 third-generation sub-clones (indicated with horizontal arrows and filled squares and with arrows at the bottom). Virus from an aliquot of each of the third-generation sub-clones was grown in Vero E6 cells (represented by the elongated triangle and “g” following the sub-clone name; last row on the right of the scheme). RNA from the third-generation derivatives of clones c10 and c11 was analyzed by ultra-deep sequencing (green elongated triangles). Procedures are detailed in Materials and Methods.
Fig 2
Fig 2
Alignment of consensus genome sequences of biological clones and sub-clones. (A) A scheme of the SARS-CoV-2 genome with indication of viral proteins and the 5′, 3′ untranslated regions (UTR) depicted at the top. Below, the genome of the USA-WA1/2020 (used as reference) and of the initial biological clones is indicated with horizontal lines. The virus clone is given at the left of each line. (B and C) Same as A but with second- and third-generation sub-clones. NR means non-resolved sequences; they involve 35 nucleotides located between residues 23480 and 23628. (D) Same as C but for the third generation sub-clones further grown in Vero E6 cells (name of sub-clone followed by g). Mutations are indicated with vertical lines and deletions by rectangles. The mutated residues (with the amino acid substitution for non-synonymous mutations indicated in parentheses) and the deletions (Del) are given at the bottom of each of the alignments in A, B, C, and D (with T representing U in the RNA); residue numbering is that of reference sequence Wuhan-Hu-1 (NC_045512.2). For A, B, and C genomes with a unique or repeated sequence, considering all clones and sub-clones, are distinguished at the right of each line, as follows; U, unique genomic sequence; R1, a sequence repeated in three genomes; R2, a sequence repeated in eight genomes; R3, a sequence repeated in seven genomes; R4, a sequence repeated in two genomes. Note that genomes in blocks C and D differ only in a synonymous mutation (T2911G). The procedure used for the determination of consensus sequences is explained in Materials and Methods.
Fig 3
Fig 3
Heat map of point mutations and deletions identified in the mutant spectrum of USA-WA1/2020 and 10 biological sub-clones. Values of mutation frequency are color coded as shown in the top box. The genomic region analyzed is indicated in the filled rectangle at the top of each panel group. The nsp12 (polymerase) region analyzed spans residues 14534–16054, and the spike (S)-coding region analyzed spans residues 21448–23645. The name of the virus or biological sub-clone is given at the left of each line within a panel. Only positions where mutations or deletions have been found are included in the map. Deletions are indicated with triangles. Residues are numbered according to reference sequence Wuhan-Hu-1 (NC_045512.2). Clade-discordant mutations and deletions are indicated inside gray rectangles. In them, amino acid substitutions are given in parentheses following the corresponding point mutation. Del and Del aa indicate deletions of nucleotides and amino acids, respectively; the asterisk in Omicron BA.1 means that in that viral lineage, the amino acids deleted are 142–144. The clade-discordant residues are also indicated in Table S6 (https://saco.csic.es/s/kYsz6A4sbzssZRp), where all mutations and deletions found in the mutant spectra and in the consensus sequences are listed. Procedures for ultra-deep sequencing and controls for reliability of mutation and deletion detection are explained in Materials and Methods.
Fig 4
Fig 4
Distribution of mutation types in the mutant spectrum of nsp12 (polymerase) and S amplicons of USA-WA1/2020 and biological sub-clones. Viruses are identified by a color code depicted in the upper box. Mutations are counted relative to the consensus sequence of the corresponding amplicon. Mutation types are displayed in abscissa, and the percentage of each mutation type is given in ordinate. The location of the amplicons in the SARS-CoV-2 genome, and the ultra-deep sequencing procedure are detailed in Materials and Methods.
Fig 5
Fig 5
Diversity indices of the mutant spectrum of SARS-CoV-2 USA-WA1/2020, its biological sub-clones, laboratory populations, and nasopharyngeal isolates of the virus. The top filled boxes indicate the genomic region analyzed. The diversity index is given in ordinate (Hpl, number of haplotypes; Hs, Shannon entropy; Mfmax, maximum mutation frequency; , nucleotide diversity) (63). The name of the virus, sub-clone, or laboratory population is written in the abscissa: Ten sub-clones: aggregate of values of the 10 sub-clones. No drug: laboratory population passaged in Vero E6 cells in the absence of drug. Rib: laboratory population passaged in Vero E6 cells in the presence of ribavirin. Rdv: laboratory population passaged in Vero E6 cells in the presence of remdesivir. Rib + Rdv: laboratory population passaged in Vero E6 cells in the presence of combinations of ribavirin and remdesivir. Laboratory populations are described in references , . Mild, moderate, and exitus COVID-19 correspond to the virus from nasopharyngeal isolates of patients with different disease severity described in reference . Bars represent the median for the nsp12 (polymerase) and spike (S) amplicons; triangles are the values for each sample and amplicon. Statistical differences have been calculated using the Kruskal-Wallis test, followed by Dunn’s multiple comparison test (*, P < 0.05; **, P < 0.01; ***, P < 0.001). Additional diversity indices for the same samples are given in Fig. S4, and the numerical values for each sample and amplicon are compiled in Tables S4 and S5 in https://saco.csic.es/s/kYsz6A4sbzssZRp or in the references quoted therein. Amplicon residues and procedures are described in Materials and Methods.

References

    1. González Aparicio LJ, López CB, Felt SA. 2022. A virus is a community: diversity within negative-sense RNA virus populations. Microbiol Mol Biol Rev 86:e0008621. doi:10.1128/mmbr.00086-21 - DOI - PMC - PubMed
    1. Sardanyés J, Perales C, Domingo E, Elena SF. 2024. Quasispecies theory and emerging viruses: challenges and applications. NPJ Viruses 2:54. doi:10.1038/s44298-024-00066-w - DOI - PMC - PubMed
    1. Domingo E, Martínez-González B, Somovilla P, García-Crespo C, Soria ME, de Ávila AI, Gadea I, Perales C. 2025. A general and biomedical perspective of viral quasispecies. RNA 31:429–443. doi:10.1261/rna.080280.124 - DOI - PMC - PubMed
    1. Šimičić P, Židovec-Lepej S. 2022. A glimpse on the evolution of RNA viruses: implications and lessons from SARS-CoV-2. Viruses 15:1. doi:10.3390/v15010001 - DOI - PMC - PubMed
    1. Domingo E, Martínez-González B, García-Crespo C, Somovilla P, de Ávila AI, Soria ME, Durán-Pastor A, Perales C. 2023. Puzzles, challenges, and information reservoir of SARS-CoV-2 quasispecies. J Virol 97:e0151123. doi:10.1128/jvi.01511-23 - DOI - PMC - PubMed

Supplementary concepts

LinkOut - more resources