Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 May 22:15:1362469.
doi: 10.3389/fgene.2024.1362469. eCollection 2024.

Host genetics and COVID-19 severity: increasing the accuracy of latest severity scores by Boolean quantum features

Collaborators, Affiliations

Host genetics and COVID-19 severity: increasing the accuracy of latest severity scores by Boolean quantum features

Gabriele Martelloni et al. Front Genet. .

Abstract

The impact of common and rare variants in COVID-19 host genetics has been widely studied. In particular, in Fallerini et al. (Human genetics, 2022, 141, 147-173), common and rare variants were used to define an interpretable machine learning model for predicting COVID-19 severity. First, variants were converted into sets of Boolean features, depending on the absence or the presence of variants in each gene. An ensemble of LASSO logistic regression models was used to identify the most informative Boolean features with respect to the genetic bases of severity. After that, the Boolean features, selected by these logistic models, were combined into an Integrated PolyGenic Score (IPGS), which offers a very simple description of the contribution of host genetics in COVID-19 severity.. IPGS leads to an accuracy of 55%-60% on different cohorts, and, after a logistic regression with both IPGS and age as inputs, it leads to an accuracy of 75%. The goal of this paper is to improve the previous results, using not only the most informative Boolean features with respect to the genetic bases of severity but also the information on host organs involved in the disease. In this study, we generalize the IPGS adding a statistical weight for each organ, through the transformation of Boolean features into "Boolean quantum features," inspired by quantum mechanics. The organ coefficients were set via the application of the genetic algorithm PyGAD, and, after that, we defined two new integrated polygenic scores (IPGSph1 and IPGSph2). By applying a logistic regression with both IPGS, (IPGSph2 (or indifferently IPGSph1) and age as inputs, we reached an accuracy of 84%-86%, thus improving the results previously shown in Fallerini et al. (Human genetics, 2022, 141, 147-173) by a factor of 10%.

Keywords: COVID-19; genetic algorithm; genetic science modeling; host genetics; integrated polygenic score; logistic regression.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Flow chart of a genetic algorithm.
FIGURE 2
FIGURE 2
PyGAD lifecycle.
FIGURE 3
FIGURE 3
(A), (B) confusion male matrix from logistic regression. (A) Input: age, IPGS, and IPGSph1 . Accuracy = 86.1%. (B) Input: age, IPGS, and IPGSph2 . Accuracy = 86.4%. (C), (D) confusion female matrix from logistic regression. (C) Input: age, IPGS, and IPGSph1 . Accuracy = 83.7%. (D) Input: age, IPGS, and IPGSph2 . Accuracy = 83.4%.
FIGURE 4
FIGURE 4
Comparison between the results obtained from a logistic regression with in input age or age+IPGS1 with shuffled variants for the (A,C) male sample and the (B,D) female sample.
FIGURE 5
FIGURE 5
Comparison between AGE + SEX, AGE + SEX + IPGS, and AGE + SEX + IPGS+ IPGSph2 as inputs at the logistic regression on the total (female + male) dataset.
FIGURE 6
FIGURE 6
Comparison between IPGS, IPGSph1 , and IPGSph2 for the female (A) and male (B) samples.
FIGURE 7
FIGURE 7
Comparison between IPGSph1 and IPGSph2 . Confusion matrices obtained from the logistic regression of the single scores with age and IPGS for the male (panels (A), (B)) and female (panels (C), (D) datasets. Panels (A) and (C) show results for IPGSph1 and (B) and (D) for IPGSph2
FIGURE 8
FIGURE 8
Confusion matrices obtained from the logistic regression of the single scores with age, IPGS, and IPGSph1 for the male (panels (A)) and female (panels (B)) patient datasets.

References

    1. Agosto A., Giudici P. (2020). A Poisson autoregressive model to understand covid-19 contagion dynamics. Risks 8, 77. 10.3390/risks8030077 - DOI
    1. Baldassarri M., Fava F., Fallerini C., Daga S., Benetti E., Zguro K., et al. (2021a). Severe covid-19 in hospitalized carriers of single cftr pathogenic variants. J. personalized Med. 11, 558. 10.3390/jpm11060558 - DOI - PMC - PubMed
    1. Baldassarri M., Picchiotti N., Fava F., Fallerini C., Benetti E., Daga S., et al. (2021b). Shorter androgen receptor polyq alleles protect against life-threatening covid-19 disease in european males. EBioMedicine 65, 103246. 10.1016/j.ebiom.2021.103246 - DOI - PMC - PubMed
    1. Ballow M., Haga C. L. (2021). Why do some people develop serious covid-19 disease after infection, while others only exhibit mild symptoms? J. Allergy Clin. Immunol. Pract. 9, 1442–1448. 10.1016/j.jaip.2021.01.012 - DOI - PMC - PubMed
    1. Benetti E., Giliberti A., Emiliozzi A., Valentino F., Bergantini L., Fallerini C., et al. (2020a). Clinical and molecular characterization of covid-19 hospitalized patients. Plos one 15, e0242534. 10.1371/journal.pone.0242534 - DOI - PMC - PubMed