Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May 24;119(21):e2123000119.
doi: 10.1073/pnas.2123000119. Epub 2022 May 17.

Impact of natural selection on global patterns of genetic variation and association with clinical phenotypes at genes involved in SARS-CoV-2 infection

Affiliations

Impact of natural selection on global patterns of genetic variation and association with clinical phenotypes at genes involved in SARS-CoV-2 infection

Chao Zhang et al. Proc Natl Acad Sci U S A. .

Abstract

Human genomic diversity has been shaped by both ancient and ongoing challenges from viruses. The current coronavirus disease 2019 (COVID-19) pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has had a devastating impact on population health. However, genetic diversity and evolutionary forces impacting host genes related to SARS-CoV-2 infection are not well understood. We investigated global patterns of genetic variation and signatures of natural selection at host genes relevant to SARS-CoV-2 infection (angiotensin converting enzyme 2 [ACE2], transmembrane protease serine 2 [TMPRSS2], dipeptidyl peptidase 4 [DPP4], and lymphocyte antigen 6 complex locus E [LY6E]). We analyzed data from 2,012 ethnically diverse Africans and 15,977 individuals of European and African ancestry with electronic health records and integrated with global data from the 1000 Genomes Project. At ACE2, we identified 41 nonsynonymous variants that were rare in most populations, several of which impact protein function. However, three nonsynonymous variants (rs138390800, rs147311723, and rs145437639) were common among central African hunter-gatherers from Cameroon (minor allele frequency 0.083 to 0.164) and are on haplotypes that exhibit signatures of positive selection. We identify signatures of selection impacting variation at regulatory regions influencing ACE2 expression in multiple African populations. At TMPRSS2, we identified 13 amino acid changes that are adaptive and specific to the human lineage compared with the chimpanzee genome. Genetic variants that are targets of natural selection are associated with clinical phenotypes common in patients with COVID-19. Our study provides insights into global variation at host genes related to SARS-CoV-2 infection, which have been shaped by natural selection in some populations, possibly due to prior viral infections.

Keywords: African diversity; SARS-CoV-2/COVID-19; genetic variation; natural selection; phenotype association.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interest.

Figures

Fig. 1.
Fig. 1.
Genetic variation at ACE2. (A) Location of coding variants and their MAF at ACE2 identified from the pooled dataset. (B) MAF of coding variants in diverse global ethnic groups. (C) The geographic distribution of the MAF for rs138390800 at ACE2 in diverse global ethnic groups is highlighted. Each pie denotes frequencies of alleles in the corresponding population. (D) Locations of identified nonsynonymous variants within the secondary structure of the ACE2 protein. (E) Six regulatory eQTLs located in an upstream enhancer of ACE2. RNA Pol2 ChIA-PET data and DNase-seq data of the large intestine, small intestine, lung, kidney, and heart are from ENCODE (68). (F) Haplotype frequencies of the six eQTLs in global populations. The six eQTLs were ordered by their genomic position on the chromosome (i.e., with the following order: rs4830977, rs4830978, rs5936010, rs4830979, rs4830980, and rs5934263). Haplotypes with frequency <0.01 are not shown.
Fig. 2.
Fig. 2.
Natural selection signatures at ACE2 in the Cameroon CAHG populations. (A) Haplotypes over 150 kb flanking ACE2 in CAHG populations. The x axis denotes the genetic variant position, and the y axis represents haplotypes. Each haplotype (one horizontal line) is composed of the genetic variants (columns). Red dots indicate the derived allele, while green dots indicate the ancestral allele. Haplotypes surrounded by the upper left vertical black lines suggest that these haplotypes carry derived allele(s) of the labeled variant near the corresponding black line. For example, the first black line denotes all the haplotypes that have the derived allele at rs138390800 (dark red line). Haplotypes carrying rs138390800, rs147311723, rs145437639, and rs186029035 show more homozygosity than other haplotypes; 1, 2, 3, and 4 along the top of the plot denote positions for rs147311723, rs186029035, rs145437639, and rs138390800, respectively. (B) EHH of rs138390800, rs186029035, and rs147311723 (rs145437639 is in strong LD with rs147311723) at ACE2 in CAHG populations.
Fig. 3.
Fig. 3.
Natural selection signatures at the upstream region of ACE2 in African populations. (A) iHS signals at the upstream region of ACE2 (chrX:15650000-15720000) in African populations. Each dot represents a SNP. Red dots denote SNPs that are significant (|iHS| > 2). The gray solid lines denote the gene body region of ACE2. Putatively causal tag SNPs are annotated in the plots. (B) Haplotype network over 150 kb flanking ACE2 in diverse ethnic populations. The network was constructed with SNPs that showed iHS signals in all populations and overlapped with DNase regions or eQTLs. The four functional candidates identified in Cameroon CAHG were also included in the networks. Each pie represents a haplotype, each color represents a geographical population, and the size of the pie is proportional to that haplotype frequency. The dashed line denotes the boundary of clade 1 and clade 2. Black ovals denote haplotypes containing the corresponding variants. (C) Haplotypes containing variants rs5936010, rs5934263, rs4830984, and rs4830986 are highlighted. Red pies denote haplotypes containing the derived allele of the corresponding variants, while green pies denote haplotypes containing the ancestral allele of the corresponding variants.
Fig. 4.
Fig. 4.
Genetic variation at TMPRSS2. (A) Location of coding variants and their MAF at TMPRSS2 identified from the pooled dataset. (B) MAF of coding variants in diverse global ethnic groups. (C) The geographic distribution of the MAF for rs75603675 at TMPRSS2 in diverse global ethnic groups is highlighted. Each pie denotes frequencies of alleles in the corresponding population. (D) Two regulatory eQTLs located in the promoter region of the TMPRSS2 gene. DNase-seq data of the large intestine, small intestine, lung, kidney, and heart are from ENCODE (68).
Fig. 5.
Fig. 5.
Natural selection signatures at TMPRSS2. (A) The result of the MK test for TMPRSS2 in the pooled dataset. Nonsyn indicates nonsynonymous variants; Syn indicates synonymous variants. “Fixed” denotes variants that were fixed between the human and the chimpanzee; “Poly” represents polymorphic variants within human populations. The transcript ENST00000398585.7 was used for the calculation. (B) Illustration of locations of variants that are divergent between the human and chimpanzee lineages on the TMPRSS2 protein domains. Boxes denote the protein domains of TMPRSS2. Red lines represent nonsynonymous variants that occurred in the corresponding domains of TMPRSS2, with the amino acids and positions of the human and the chimpanzee annotated at the bottom of the lines. Blue lines denote synonymous variants. LDLRA, LDL receptor class A; SRCR, scavenger receptor cysteine-rich domain 2; TM, transmembrane domain.
Fig. 6.
Fig. 6.
Associations between genetic variations at four genes and clinical disease phenotypes. (A) Gene-based association result between coding variants at four genes and 12 disease classes. The disease classes are shown on the x axis, and the y axis represents the P values. (B) PheWAS plot of the eQTLs associated with four genes and ∼1,800 disease codes across 17 disease categories. The disease categories are shown on the x axis, and the y axis represents the −log10 of the P values. The colored dots represent eQTLs and the direction of effect of the association. The red dashed lines denote the 0.0001 cutoff, and the blue dashed lines represent the 0.001 cutoff.

Update of

Similar articles

Cited by

References

    1. Yancy C. W., COVID-19 and African Americans. JAMA 323, 1891–1892 (2020). - PubMed
    1. Alcendor D. J., Racial disparities-associated COVID-19 mortality among minority populations in the US. J. Clin. Med. 9, 2442 (2020). - PMC - PubMed
    1. Hoffmann M., et al. , SARS-CoV-2 cell entry depends on ACE2 and TMPRSS2 and is blocked by a clinically proven protease inhibitor. Cell 181, 271–280.e8 (2020). - PMC - PubMed
    1. Walls A. C., et al. , Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein. Cell 181, 281–292.e6 (2020). - PMC - PubMed
    1. Cao Y., et al. , Comparative genetic analysis of the novel coronavirus (2019-nCoV/SARS-CoV-2) receptor ACE2 in different populations. Cell Discov. 6, 11 (2020). - PMC - PubMed

Publication types

Substances