Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jun 29;14(7):1434.
doi: 10.3390/v14071434.

Contrasting Epidemiology and Population Genetics of COVID-19 Infections Defined by Multilocus Genotypes in SARS-CoV-2 Genomes Sampled Globally

Affiliations

Contrasting Epidemiology and Population Genetics of COVID-19 Infections Defined by Multilocus Genotypes in SARS-CoV-2 Genomes Sampled Globally

Felicia Hui Min Chan et al. Viruses. .

Abstract

Since its emergence in 2019, SARS-CoV-2 has spread and evolved globally, with newly emerged variants of concern (VOCs) accounting for more than 500 million COVID-19 cases and 6 million deaths. Continuous surveillance utilizing simple genetic tools is needed to measure the viral epidemiological diversity, risk of infection, and distribution among different demographics in different geographical regions. To help address this need, we developed a proof-of-concept multilocus genotyping tool and demonstrated its utility to monitor viral populations sampled in 2020 and 2021 across six continents. We sampled globally 22,164 SARS-CoV-2 genomes from GISAID (inclusion criteria: available clinical and demographic data). They comprised two study populations, “2020 genomes” (N = 5959) sampled from December 2019 to September 2020 and “2021 genomes” (N = 16,205) sampled from 15 January to 15 March 2021. All genomes were aligned to the SARS-CoV-2 reference genome and amino acid polymorphisms were called with quality filtering. Thereafter, 74 codons (loci) in 14 genes including orf1ab polygene (N = 9), orf3a, orf8, nucleocapsid (N), matrix (M), and spike (S) met the 0.01 minimum allele frequency criteria and were selected to construct multilocus genotypes (MLGs) for the genomes. At these loci, 137 mutant/variant amino acids (alleles) were detected with eight VOC-defining variant alleles, including N KR203&204, orf1ab (I265, F3606, and L4715), orf3a H57, orf8 S84, and S G614, being predominant globally with > 35% prevalence. Their persistence and selection were associated with peaks in the viral transmission and COVID-19 incidence between 2020 and 2021. Epidemiologically, older patients (≥20 years) compared to younger patients (<20 years) had a higher risk of being infected with these variants, but this association was dependent on the continent of origin. In the global population, the discriminant analysis of principal components (DAPC) showed contrasting patterns of genetic clustering with three (Africa, Asia, and North America) and two (North and South America) continental clusters being observed for the 2020 and 2021 global populations, respectively. Within each continent, the MLG repertoires (range 40−199) sampled in 2020 and 2021 were genetically differentiated, with ≤4 MLGs per repertoire accounting for the majority of genomes sampled. These data suggested that the majority of SARS-CoV-2 infections in 2020 and 2021 were caused by genetically distinct variants that likely adapted to local populations. Indeed, four GISAID clade-defined VOCs - GRY (Alpha), GH (Beta), GR (Gamma), and G/GK (Delta variant) were differentiated by their MLG signatures, demonstrating the versatility of the MLG tool for variant identification. Results from this proof-of-concept multilocus genotyping demonstrates its utility for SARS-CoV-2 genomic surveillance and for monitoring its spatiotemporal epidemiology and evolution, particularly in response to control interventions including COVID-19 vaccines and chemotherapies.

Keywords: COVID-19; SARS-CoV-2; epidemiology; evolution; genetics; linkage; multilocus; mutation and transmission.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Longitudinal prevalence of eight SARS-CoV-2 mutant alleles and reported new cases of COVID-19 in 2020. The prevalence data for the alleles and newly confirmed cases (WHO report 2020) are reported for December 2019 to September 2020. The frequency of the eight mutant alleles is shown with the line graph. Four mutant alleles, orf1ab L4715, S G614, and N K203, and R204, were associated with sharp peaks in the number of COVID-19 new cases (bar graph).
Figure 2
Figure 2
Epidemiology of infection with variant-specific alleles at eight informative loci in SARS-CoV-2 genomes sampled in 2020. The likelihood of patients harbouring an infection that carried a mutant allele at the eight informative loci was compared among patients in different age groups, gender, and geographical regions and patients with different COVID-19 forms. The adjusted odds ratio (OR) with the p-value is depicted with orange, blue, and grey colours corresponding to OR > 1 and p-value < 0.05; OR < 1 and p-value < 0.05; and 1< OR > 1 and p-value > 0.05, respectively. – indicates no data, i.e., the OR was not estimated due to the sample size being less than five.
Figure 3
Figure 3
Multilocus LD, genetic relatedness, and differentiation of MLGs in the 2020 study population. (A). Rarefaction curve of the unique MLGs identified in each geographical region. There was no sign of levelling-off, i.e., plateauing in the curves for all six continents, indicating that not all unique MLGs in each continent were detected. (B). Pairwise LD estimates among genes in the 2020 study population. The r¯d ranges from 0 (no LD) to 1 (complete LD). The values in the coloured heatmap indicate the p-value associated with the pairwise r¯d estimates. The strongest LD signal (r¯d ≥ 0.200, p-value < 0.001) was observed between NSP12 and ORF8, NSP12 and S, ORF8 and S, and ORF3a and NSP2. (C). Network analysis to visualise the relatedness among the 472 unique MLGs detected in the 2020 study population. The minimum spanning tree identified 11 clonal complexes including eight global complexes (GC1–8) and three continent-specific complexes, Asia (AC1-2) and Oceania (OC1). (D). Population membership probability assignments of MLGs in each continent. This probability, expressed as a percentage, predicted how closely related MLGs were to each other with respect to the reported continent of origin. Admixture populations were prominent in all six continents. (E). DAPC analysis identified one global cluster (MLGs from all continents, central axis of PCA plot) and three continental clusters, Africa, Asia, and North America. (F). Genetic differentiation (Nei GST) of MLGs in the 2020 study population. GST values ranging from 0 to 0.09, 0.1 to 0.19, and ≥0.2 indicate little, moderate, and great genetic differentiation, respectively.
Figure 4
Figure 4
Multilocus genetic differentiation of 2020 and 2021 MLG repertoires. For the 2020 and 2021 viral populations that were sampled, 472 and 445 unique MLGs, respectively, were detected. (A). Among the 2021 MLGs, the PCA showed considerable genetic differentiation with respect to the nine GISAID clades: G, GH, GK, GR, GRY, GV, L, O, and S. (B). Clades GH (ORF3a Q57H), GK (S T478K), GR (N G203R) and GV (S A222V), which arose from the G clade, formed separate clusters. VOCs including the Alpha, Beta, Gamma, and Delta variants, associated with the GRY, GH, GR, and G/GK clades, respectively, were responsible for large waves of SARS-CoV-2 transmission and COVID-19 cases in most parts of the world in 2021. (CG). Predominant MLGs (represented by the coloured peaks) accounted for the majority of genomes sampled in the 2020 and 2021 viral populations; within each continent, there was great genetic differentiation (GST ≥ 0.333) between the 2020 and 2021 MLG repertoires.

References

    1. Velavan T.P., Pallerla S.R., Rüter J., Augustin Y., Kremsner P.G., Krishna S., Meyer C.G. Host genetic factors determining COVID-19 susceptibility and severity. EBioMedicine. 2021;72:103629. doi: 10.1016/j.ebiom.2021.103629. - DOI - PMC - PubMed
    1. Suh S., Lee S., Gym H., Yoon S., Park S., Cha J., Kwon D.-H., Yang Y., Jee S.H. A systematic review on papers that study on Single Nucleotide Polymorphism that affects coronavirus 2019 severity. BMC Infect. Dis. 2022;22:47. doi: 10.1186/s12879-022-07034-w. - DOI - PMC - PubMed
    1. Talic S., Shah S., Wild H., Gasevic D., Maharaj A., Ademi Z., Li X., Xu W., Mesa-Eguiagaray I., Rostron J., et al. Effectiveness of public health measures in reducing the incidence of covid-19, SARS-CoV-2 transmission, and covid-19 mortality: Systematic review and meta-analysis. BMJ. 2021;375:e068302. doi: 10.1136/bmj-2021-068302. - DOI - PMC - PubMed
    1. Li Z., Liu X., Liu M., Wu Z., Liu Y., Li W., Liu M., Wang X., Gao B., Luo Y., et al. The Effect of the COVID-19 Vaccine on Daily Cases and Deaths Based on Global Vaccine Data. Vaccines. 2021;9:1328. doi: 10.3390/vaccines9111328. - DOI - PMC - PubMed
    1. Lin Y.-C., Chi W.-J., Lin Y.-T., Lai C.-Y. The spatiotemporal estimation of the risk and the international transmission of COVID-19: A global perspective. Sci. Rep. 2020;10:20021. doi: 10.1038/s41598-020-77242-4. - DOI - PMC - PubMed

Publication types

Substances

Supplementary concepts