Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jan;141(1):147-173.
doi: 10.1007/s00439-021-02397-7. Epub 2021 Dec 10.

Common, low-frequency, rare, and ultra-rare coding variants contribute to COVID-19 severity

Collaborators, Affiliations

Common, low-frequency, rare, and ultra-rare coding variants contribute to COVID-19 severity

Chiara Fallerini et al. Hum Genet. 2022 Jan.

Abstract

The combined impact of common and rare exonic variants in COVID-19 host genetics is currently insufficiently understood. Here, common and rare variants from whole-exome sequencing data of about 4000 SARS-CoV-2-positive individuals were used to define an interpretable machine-learning model for predicting COVID-19 severity. First, variants were converted into separate sets of Boolean features, depending on the absence or the presence of variants in each gene. An ensemble of LASSO logistic regression models was used to identify the most informative Boolean features with respect to the genetic bases of severity. The Boolean features selected by these logistic models were combined into an Integrated PolyGenic Score that offers a synthetic and interpretable index for describing the contribution of host genetics in COVID-19 severity, as demonstrated through testing in several independent cohorts. Selected features belong to ultra-rare, rare, low-frequency, and common variants, including those in linkage disequilibrium with known GWAS loci. Noteworthily, around one quarter of the selected genes are sex-specific. Pathway analysis of the selected genes associated with COVID-19 severity reflected the multi-organ nature of the disease. The proposed model might provide useful information for developing diagnostics and therapeutics, while also being able to guide bedside disease management.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict interests.

Figures

Fig. 1
Fig. 1
Feature selection and gene discovery. A Whole-exome sequencing (WES) data stored in the Genetic Data Repository of the GEN-COVID Multicenter Study (GCGDR) and coming from biospecimens of 1780 SARS-CoV-2 PCR-positive subjects of European ancestry of different severity were used as the training set. B Clinical severity classification into severe and mild cases was performed by Ordered Logistic Regression (OLR) starting from the WHO grading and patient age classifications. C WES data were binarized into 0 or 1 depending on the absence (0) or the presence (1) of variants (or the combination of two or more variants only for common polymorphisms) in each gene. D LASSO logistic regression feature selection methodology on multiple train-test splits of the cohort leads to the identification of the final set of features contributing to the clinical variability of COVID-19 (E). From the initial 163,099 cumulative features (divided into 36,540 ultra-rare, 23,470 rare, 13,056 low frequency and 90,033 common features) in 12 Boolean representations, the selected features contributing to COVID-19 clinical variability are 7249 and they are reported in the Supplementary Tables 3–6. The total number of genes contributing to COVID-19 clinical variability was 4260 in males and 4360 in females, 75% of which were in common
Fig. 2
Fig. 2
Biological impact of ultra-rare, rare, low-frequency, and common features. Examples of ultra-rare (A), rare (B), low-frequency (C), and common (D) features are illustrated in panel A–D. The complete list of features is presented in Supplementary Tables 3–6. formula image = contributing to COVID-19 severity; formula image = contributing to COVID-19 mildness. Pink faces = contributing to females only; blue faces = contributing to males only; pink/blue faces = contribution in both sexes. In parentheses: AD = autosomal dominant inheritance; AR = autosomal recessive inheritance; XL = X-linked recessive inheritance. A Ultra-rare mutations in the RNA sensor TLR7, TLR3, and TICAM1 (encoding TRIF protein), already reported associated with XL, AR and AD inheritance (Zhang et al. ; Van der Made et al. ; Fallerini et al. ; Solanich et al. 2021) impair interferon (IFNs) production in innate immune system cells. Mutations in TLR8, as well as of the signal transducer IRAK1 also impair interferon production. The specific location of TLR7/8 and IRAK1 (on the X chromosome) as well as X-inactivation escaping are responsible for opposite effects in males and females. Mutation in RNASEL impair the antiviral effect of the gene. In lung epithelial cells, ACE2 ultra-rare variants (on the X chromosome) exert protective effects (probably) due to lowering virus entrance, while ultra-rare variants in ADAM17 (might) reduce the shedding of ACE2 and induce a severe outcome. The same is true for CFTR and SCNN1A (encoding ENaCA protein and involved in a CFTR-related physiological pathway), and the lipid transporter ABCA3 (Baldassarri et al. 2021b).Mutations of ADAMTS13 in vessels reduce the cleavage of the multimeric von Willebrand Factor (VWF), leading to thrombosis; B) Rare variants of the estrogen regulated TLR5 are associated with severity in females. Rare variants of the CFTR-related SLC26A9 are associated with severity in both sexes. This ion transporter has three discrete physiological modes: nCl(–)-HCO(3)(–) exchanger, Cl(–) channel, and Na(+)-anion cotransporter. Other examples of rare mutations associated with severity are the NK and T cell receptor FCRL6, IFN signal transducer IRAK2, and the actin depolymerization MICAL2; C low-frequency variants in another CFTR-related gene, SCNN1D (encoding for ENaCD protein) are associated with mildness, while rare variants in the following genes are associated with severity: cargo protein SPMA6, vesicle formation PEX1, inflammatory protein NOD2 (CARD15); D A number of coding polymorphisms, indicated with an asterisk, are in LD with genomic SNPs already associated with COVID-19 (The complete list is presented in Supplementary Table 11) (Severe Covid-19 GWAS Group (; Pairo-Castineira et al. 2020). In some cases, such as the case of SFTDP, the genomic SNP is the coding polymorphism itself. Of note are the genes of surfactant proteins associated with severe disease: SFTDP gene encoding for SP-D protein and SFTPA1 gene encoding for SP-A protein; the signal transducer, PPP1R15A gene encoding for GADD34 protein. OAS1 and OAS3 related to RNA clearance of RNASEL (reported in panel A as having ultra-rare mutations; included here should also be the already reported TLR3412 (Croci et al. 2021); the already reported SELP603 related to thrombosis (Fallerini et al. 2021a). Note: OAS1 haplotype A = c.1039-1G>A (Wickenhagen et al. 2021), (p.(Gly162Ser)), (p.(Ala352Thr)), (p.(Arg361Thr)), (p.(Gly397Arg)), (p.(Thr358Profs*26)). OAS1 haplotype B = haplotype without the variant combination in haplotype A
Fig. 3
Fig. 3
Integrated PolyGenic Score Definition. A The model is based on the comparison of Boolean features of severity versus Boolean features of mildness. B Graphic representation of the IPGS formula used for this model. C Principle for the calibration of different weighting factors based on the separation of severe and mild cases. D The obtained value for low-frequency, rare, and ultra-rare, being F = 1 for common variants. Common variants are indicated as common haplotypes since they are intended as combinations of coding variants within a single gene (see Fig. 1C and the Material and methods section)
Fig. 4
Fig. 4
Pathway enrichment analysis of the genes associated with disease severity/mildness. A Workflow of the analysis. Genes corresponding to Boolean features found to be associated at least once were ranked based on a composite score and subjected to Gene Set Enrichment Analysis. Two separate ranked gene lists for females (7317 genes, weight range 3 × 10–5-561) and males (7325 genes, weight range 7 × 10–5-452) were used. The list of significant pathways was analysed and presented as a similarity network: B Similarity network of the pathways with a significant enrichment both in females and males (p < 0.01). The size of the circles is proportional to the pathway size. Significance above threshold is indicated by the red color. C Similarity network of the pathways with a significant enrichment either in females (red left half of the circles) or males (red right half of the circles) (p < 0.005). D Heatmaps of the genes belonging to a selection of pathways of interest. The color gradient represents the weight of each gene, calculated and described in methods. Please note high ranking of TLR genes (TLR5, TLR8, TLR3 and TLR7) in the pathway of Response to Mechanical Stimulus, CFTR gene in Recognition for Clathrin-mediated endocytosis, RNASEL, TYK2, OAS1 and OAS3 genes in Interferon alpha–beta signaling. Note also the presence of the relevant pathway of Exhaust vs Memory CD8 T cell Up that also includes TLR7 gene
Fig. 5
Fig. 5
Model predictivity. A The post-Mendelian model was trained using a sample of 466 patients from the GEN-COVID cohort n.2 and Swedish cohort (having cases only) and tested with three additional European cohorts from UK, Germany and Canada. B A logistic regression model was used for severity prediction. Severity was defined mainly on the basis of hospitalization versus not hospitalization. Hospitalized cases without respiratory support were included in controls. TN = true negative; TP =  true positive; FN =  false negative; FP =  false positive. C When the IPGS is added to age and gender as a regressor, the performances of the model increase: accuracy + 1%, precision + 1%, sensitivity + 2%, specificity + 1%. These increases are statistically significant (p value < 0.05 for accuracy, precision, sensitivity and specificity) with respect to the null distribution obtained by randomizing the IPGS. The performances of the model built with IPGS alone are all above the random guess. In addition, on the right, we reported the distributions of the IPGS for severe and non-severe patients. D In the three tested cohorts, when the IPGS is added to age and sex as a regressor, all the performances increase: the accuracy up to + 2%, the precision up to + 1%, the sensitivity up to + 3%, and the specificity up to + 2%. We conclude that IPGS is able to improve prediction of clinical outcome in addition to the well-established powerful factors of age and sex. E The univariate logistic regression models fitted on the cohort including both train and test, confirmed that the IPGS is associated with severity with an odds-ratio (OR) of 2.32, while age (continuous in decades) and sex have an OR of 1.89 and 2.99, respectively
Fig. 6
Fig. 6
Clinically interpretability of IPGS. Panel A shows the GEN-COVID cohort dendrogram and heatmaps of the probabilities of severity based on the 3 different models: sex-age alone, IPGS alone and combined model. In the extreme ends of dendrogram (left and right) the probability of severity based on sex-age alone and IPGS alone is highly discordant (different colors). Selected examples corresponding to the arrows are illustrated in panels B-G. In each panel IPGS score, probabilities of severity and key features useful for bedside clinical management are shown. B) Male patient, in the 46–50 age range, treated with CPAP ventilation, tocilizumab, enoxaparin, hydroxychloroquine and lopinavir/ritonavir; no comorbidities except for asthma have been reported. The patient presented a rare TLR7 mutation that leads to an impaired production of interferon gamma (Made et al. 2020). C) Male patient, in the 51–55 age range, treated with invasive mechanical ventilation, steroids and enoxaparin. He had among comorbidities obesity, anxiety, hypertension and cerebral ischemia. He was found to be homozygous for the SELP rs6127 (p.Asp603Asn). Homozygosity of Asparagine in position 603 of Selectin P makes this endothelial protein more prone to clot formation and male patients more prone to COVID-19 thrombosis (Croci et al. 2021). Hence, the rationale for considering as putative adjuvant therapy in the management of similar cases the anti-Selectin P antibodies, a drug already approved for vascular events of sickle cell anemia. D) Male patient, in the 51–55 age range, treated with CPAP ventilation, tocilizumab, steroids, enoxaparin, hydroxychloroquine and lopinavir/ritonavir; no comorbidities except for diabetes. He was found to have the androgen receptor polyQ repeats > 23. The regular function of the androgen receptor is correlated with a beneficial immunomodulatory effect in those male patients in whom the increase in testosterone levels may overcome the receptor resistance. The rationale is to consider giving testosterone to those male subjects who cannot, on their own, raise the levels enough to overcome the receptor resistance due to poly-glutamine stretch longer than 23 repeats (Daga et al. 2021). E) Female patient, in the 31–35 age range, treated with CPAP ventilation and steroids, enoxaparin and azithromycin; no comorbidities except for hypothyroidism. She was a carrier of an ultra-rare mutation in ADAMTS13. Impaired function of ADAMTS13 leads to reduced cleavage of von Willebrand factor (vWF) and enhanced clot formation. The effect is enhanced in females and responsible for SARS-CoV-2 related thrombosis. Anti-vWF immunoglobulins would be a putative therapeutic option to consider in similar cases. F-G) examples of low IPGS and related key features. F) Male patient, in the 81–85 age range, treated with low-flow oxygen. No information regarding pharmacological therapy during hospitalization is present. Among comorbidities: diabetes mellitus, congestive heart failure and bowel cancer and steroids. He presented an ultra-rare mutation in ACE2. G) Male patient, in the 86–90 age range, treated with low-flow oxygen, steroid, enoxaparin and ceftriaxone plus azithromycin. Among his comorbidities: colon diverticulosis with constipation?, benign prostatic hyperplasia?, anxious-depressive syndrome, sideropenic anemia. He was a carrier of an ultra-rare mutation in AGTR2

References

    1. Baldassarri M, Picchiotti N, Fava F, et al. Shorter androgen receptor polyQ alleles protect against life-threatening COVID-19 disease in European males. EBioMedicine. 2021;65:103246. doi: 10.1016/j.ebiom.2021.103246. - DOI - PMC - PubMed
    1. Baldassarri M, Fava F, Fallerini C, et al. Severe COVID-19 in hospitalized carriers of single CFTR pathogenic variants. J Person Med. 2021;11(6):558. doi: 10.3390/jpm11060558. - DOI - PMC - PubMed
    1. Bayati A, Kumar R, Francis V, McPherson PS. SARS-CoV-2 infects cells after viral entry via clathrin-mediated endocytosis. J Biol Chem. 2021;296:100306. doi: 10.1016/j.jbc.2021.100306. - DOI - PMC - PubMed
    1. Benetti E, Giliberti A, Emiliozzi A, et al. Clinical and molecular characterization of COVID-19 hospitalized patients. PLoS ONE. 2020;15(11):e0242534. doi: 10.1371/journal.pone.0242534. - DOI - PMC - PubMed
    1. Benetti E, Tita R, Spiga O, et al. ACE2 gene variants may underlie interindividual variability and susceptibility to COVID-19 in the Italian population. Eur J Hum Genet. 2020;28(11):1602–1614. doi: 10.1038/s41431-020-0691-z. - DOI - PMC - PubMed