Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Dec 5:2024.12.04.24318493.
doi: 10.1101/2024.12.04.24318493.

Genome-wide Machine Learning Analysis of Anosmia and Ageusia with COVID-19

Affiliations

Genome-wide Machine Learning Analysis of Anosmia and Ageusia with COVID-19

Lucas Pietan et al. medRxiv. .

Abstract

The COVID-19 pandemic has caused substantial worldwide disruptions in health, economy, and society, manifesting symptoms such as loss of smell (anosmia) and loss of taste (ageusia), that can result in prolonged sensory impairment. Establishing the host genetic etiology of anosmia and ageusia in COVID-19 will aid in the overall understanding of the sensorineural aspect of the disease and contribute to possible treatments or cures. By using human genome sequencing data from the University of Iowa (UI) COVID-19 cohort (N=187) and the National Institute of Health All of Us (AoU) Research Program COVID-19 cohort (N=947), we investigated the genetics of anosmia and/or ageusia by employing feature selection techniques to construct a novel variant and gene prioritization pipeline, utilizing machine learning methods for the classification of patients. Models were assessed using a permutation-based variable importance (PVI) strategy for final prioritization of candidate variants and genes. The highest held-out test set area under the receiver operating characteristic (AUROC) curve for models and datasets from the UI cohort was 0.735 and 0.798 for the variant and gene analysis respectively and for the AoU cohort was 0.687 for the variant analysis. Our analysis prioritized several novel and known candidate host genetic factors involved in immune response, neuronal signaling, and calcium signaling supporting previously proposed hypotheses for anosmia/ageusia in COVID-19.

PubMed Disclaimer

Conflict of interest statement

COMPETING INTEREST STATMENT The authors declare no conflict of interest.

Figures

Figure 1.
Figure 1.
Top performing whole genome sequencing machine learning pipelines for variant and gene analysis. Random I.D. is randomly selected intermediate datasets.
Figure 2.
Figure 2.
Permutation-based variable importance (PVI) top 5 features for top performing dataset and models for the UI cohort loss of smell analysis. All features included in the model are assessed. (A) Variable importance for the extreme gradient boosted tree (XGBTree) model, top performing model for the variant analysis. (B) Variable importance for the Elastic Net model, top performing model for the gene analysis according to the accuracy and area under the receiver operating characteristic (AUROC) curve metrics. (C) Variable importance for the support vector machine with polynomial kernel function (SVM-P) model, top performing model for the gene analysis according to brier score.
Figure 3.
Figure 3.
Permutation-based variable importance top 5 features for top performing dataset and models for the AoU cohort variant analysis. All features included in the model are assessed. (A) Variable importance for the Lasso model, top performing model for the variant analysis according to the accuracy metric. (B) Variable importance for the Elastic Net model (accuracy). (C) Variable importance for the RF model (brier score). (D) Variable importance for the SVM-RB model (brier score and AUROC curve).
Figure 4.
Figure 4.
Summary of hypothesis for the involvement of prioritized candidate genes and biological processes in the development of anosmia and ageusia with COVID-19. Created in BioRender. Pietan, L. (2025) https://BioRender.com/h04s965

Similar articles

References

    1. Abdelhafez M, Nasereddin A, Shamma OA, Abed R, Sinnokrot R, Marof O, Heif T, Erekat Z, Al-Jawabreh A, Ereqat S. 2023. Association of IFNAR2 rs2236757 and OAS3 rs10735079 Polymorphisms with Susceptibility to COVID-19 Infection and Severity in Palestine. Interdiscip Perspect Infect Dis 2023: 9551163. doi:10.1155/2023/9551163 - DOI - PMC - PubMed
    1. Ahmed Z, Renart EG, Zeeshan S. 2021. Genomics pipelines to investigate susceptibility in whole genome and exome sequenced data for variant discovery, annotation, prediction and genotyping. PeerJ 9: e11724. doi:10.7717/peerj.11724 - DOI - PMC - PubMed
    1. Ahsan MM, Luna SA, Siddique Z. 2022. Machine-learning-based disease diagnosis: A comprehensive review. Healthcare 10: 541. doi:10.3390/healthcare10030541 - DOI - PMC - PubMed
    1. Alzubi R, Ramzan N, Alzoubi H, Amira A. 2017. A hybrid feature selection method for complex diseases SNPs. IEEE Access 6: 1292–1301. doi:10.1109/ACCESS.2017.2778268 - DOI
    1. Anastassopoulou C, Davaris N, Ferous S, Siafakas N, Boufidou F, Anagnostopoulos K, Tsakris A. 2024. The Molecular Basis of Olfactory Dysfunction in COVID-19 and Long COVID. Lifestyle Genom 17: 42–56. doi:10.1159/000539292 - DOI - PubMed

Publication types

LinkOut - more resources