Identifying COVID-19 Severity-Related SARS-CoV-2 Mutation Using a Machine Learning Method
- PMID: 35743837
- PMCID: PMC9225528
- DOI: 10.3390/life12060806
Identifying COVID-19 Severity-Related SARS-CoV-2 Mutation Using a Machine Learning Method
Abstract
SARS-CoV-2 shows great evolutionary capacity through a high frequency of genomic variation during transmission. Evolved SARS-CoV-2 often demonstrates resistance to previous vaccines and can cause poor clinical status in patients. Mutations in the SARS-CoV-2 genome involve mutations in structural and nonstructural proteins, and some of these proteins such as spike proteins have been shown to be directly associated with the clinical status of patients with severe COVID-19 pneumonia. In this study, we collected genome-wide mutation information of virulent strains and the severity of COVID-19 pneumonia in patients varying depending on their clinical status. Important protein mutations and untranslated region mutations were extracted using machine learning methods. First, through Boruta and four ranking algorithms (least absolute shrinkage and selection operator, light gradient boosting machine, max-relevance and min-redundancy, and Monte Carlo feature selection), mutations that were highly correlated with the clinical status of the patients were screened out and sorted in four feature lists. Some mutations such as D614G and V1176F were shown to be associated with viral infectivity. Moreover, previously unreported mutations such as A320V of nsp14 and I164ILV of nsp14 were also identified, which suggests their potential roles. We then applied the incremental feature selection method to each feature list to construct efficient classifiers, which can be directly used to distinguish the clinical status of COVID-19 patients. Meanwhile, four sets of quantitative rules were set up, which can help us to more intuitively understand the role of each mutation in differentiating the clinical status of COVID-19 patients. Identified key mutations linked to virologic properties will help better understand the mechanisms of infection and will aid in the development of antiviral treatments.
Keywords: SARS-CoV-2; decision rules; feature selection; machine learning; mutation.
Conflict of interest statement
The authors declare no conflict of interest.
Figures









Similar articles
-
Machine Learning Classification of Time since BNT162b2 COVID-19 Vaccination Based on Array-Measured Antibody Activity.Life (Basel). 2023 May 31;13(6):1304. doi: 10.3390/life13061304. Life (Basel). 2023. PMID: 37374086 Free PMC article.
-
Identification of Gene Markers Associated with COVID-19 Severity and Recovery in Different Immune Cell Subtypes.Biology (Basel). 2023 Jul 2;12(7):947. doi: 10.3390/biology12070947. Biology (Basel). 2023. PMID: 37508378 Free PMC article.
-
V367F Mutation in SARS-CoV-2 Spike RBD Emerging during the Early Transmission Phase Enhances Viral Infectivity through Increased Human ACE2 Receptor Binding Affinity.J Virol. 2021 Jul 26;95(16):e0061721. doi: 10.1128/JVI.00617-21. Epub 2021 Jul 26. J Virol. 2021. PMID: 34105996 Free PMC article.
-
Current Strategies of Antiviral Drug Discovery for COVID-19.Front Mol Biosci. 2021 May 13;8:671263. doi: 10.3389/fmolb.2021.671263. eCollection 2021. Front Mol Biosci. 2021. PMID: 34055887 Free PMC article. Review.
-
The Development of SARS-CoV-2 Variants: The Gene Makes the Disease.J Dev Biol. 2021 Dec 15;9(4):58. doi: 10.3390/jdb9040058. J Dev Biol. 2021. PMID: 34940505 Free PMC article. Review.
Cited by
-
Epigenetic age acceleration in surviving versus deceased COVID-19 patients with acute respiratory distress syndrome following hospitalization.Clin Epigenetics. 2023 Nov 28;15(1):186. doi: 10.1186/s13148-023-01597-4. Clin Epigenetics. 2023. PMID: 38017502 Free PMC article.
-
Identification of key gene expression associated with quality of life after recovery from COVID-19.Med Biol Eng Comput. 2024 Apr;62(4):1031-1048. doi: 10.1007/s11517-023-02988-8. Epub 2023 Dec 21. Med Biol Eng Comput. 2024. PMID: 38123886
-
Benefits of Repeated SARS-CoV-2 Vaccination and Virus-induced Cross-neutralization Potential in Immunocompromised Transplant Patients and Healthy Individuals.Open Forum Infect Dis. 2024 Sep 9;11(10):ofae527. doi: 10.1093/ofid/ofae527. eCollection 2024 Oct. Open Forum Infect Dis. 2024. PMID: 39371367 Free PMC article.
-
Multivariate analyses and machine learning link sex and age with antibody responses to SARS-CoV-2 and vaccination.iScience. 2024 Jul 10;27(8):110484. doi: 10.1016/j.isci.2024.110484. eCollection 2024 Aug 16. iScience. 2024. PMID: 39156648 Free PMC article.
-
Prediction models for COVID-19 disease outcomes.Emerg Microbes Infect. 2024 Dec;13(1):2361791. doi: 10.1080/22221751.2024.2361791. Epub 2024 Jun 14. Emerg Microbes Infect. 2024. PMID: 38828796 Free PMC article.
References
-
- Pachetti M., Marini B., Benedetti F., Giudici F., Mauro E., Storici P., Masciovecchio C., Angeletti S., Ciccozzi M., Gallo R.C. Emerging SARS-CoV-2 mutation hot spots include a novel rna-dependent-rna polymerase variant. J. Transl. Med. 2020;18:179. doi: 10.1186/s12967-020-02344-6. - DOI - PMC - PubMed
Grants and funding
- XDA26040304/Strategic Priority Research Program of Chinese Academy of Sciences
- XDB38050200/Strategic Priority Research Program of Chinese Academy of Sciences
- 2018YFC0910403/National Key R&D Program of China
- 202002/Fund of the Key Laboratory of Tissue Microenvironment and Tumor of Chinese Academy of Sciences
LinkOut - more resources
Full Text Sources
Miscellaneous