Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jun;14(2):504-519.
doi: 10.1007/s12539-021-00465-0. Epub 2021 Aug 6.

Enabling Artificial Intelligence for Genome Sequence Analysis of COVID-19 and Alike Viruses

Affiliations

Enabling Artificial Intelligence for Genome Sequence Analysis of COVID-19 and Alike Viruses

Imran Ahmed et al. Interdiscip Sci. 2022 Jun.

Abstract

Recent pandemic of COVID-19 (Coronavirus) caused by severe acute respiratory syndrome Coronavirus 2 (SARS-CoV-2) has been growing lethally with unusual speed. It has infected millions of people and continues a mortifying influence on the global population's health and well-being. In this situation, genome sequence analysis and advanced artificial intelligence techniques may help researchers and medical experts to understand the genetic variants of COVID-19 or SARS-CoV-2. Genome sequence analysis of COVID-19 is crucial to understand the virus's origin, behavior, and structure, which might help produce/develop vaccines, antiviral drugs, and efficient preventive strategies. This paper introduces an artificial intelligence based system to perform genome sequence analysis of COVID-19 and alike viruses, e.g., SARS, middle east respiratory syndrome, and Ebola. The system helps to get important information from the genome sequences of different viruses. We perform comparative data analysis by extracting basic information of COVID-19 and other genome sequences, including information of nucleotides composition and their frequency, tri-nucleotide compositions, count of amino acids, alignment between genome sequences, and their DNA similarity information. We use different visualization methods to analyze these viruses' genome sequences and, finally, apply machine learning based classifier support vector machine to classify different genome sequences. The data set of different virus genome sequences are obtained from an online publicly accessible data center repository. The system achieves good classification results with an accuracy of 97% for COVID-19, 96%, SARS, and 95% for MERS and Ebola genome sequences, respectively.

Keywords: Artificial intelligence; COVID-19; Genome sequence analysis; Machine learning; SVM.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Artificial intelligence based genome sequence analysis and classification of COVID-19 and alike viruses. The presented system first performs comparative data analysis and then used a machine learning based classifier to classify genome sequences of different viruses
Fig. 2
Fig. 2
Visualization of nucleotides in the DNA sequence of four types of viruses
Fig. 3
Fig. 3
Nucleotides frequency in the DNA sequence of four types of viruses
Fig. 4
Fig. 4
Tri-nucleotides frequency in the DNA
Fig. 5
Fig. 5
Composition of amino acids
Fig. 6
Fig. 6
Sequence of nucleotide triplets of all four types of genome sequences
Fig. 7
Fig. 7
Coding sequence CDS in genome sequences of different genome sequences
Fig. 8
Fig. 8
Alignment similarity between genome sequences
Fig. 9
Fig. 9
Dot Plot showing difference between different genome sequences
Fig. 10
Fig. 10
Accuracy, Precision, Recall and F1 Score of classification method used of different genome sequences (COVID-19 and alike viruses)
Fig. 11
Fig. 11
Classification results using SVM (TPR vs FPR)

References

    1. Marquez S, Prado-Vivar B, Guadalupe JJ, Gutierrez B, Jibaja M, Tobar M, Mora F, Gaviria J, Garcia M, Espinosa F, et al. Genome sequencing of the first SARS-CoV-2 reported from patients with COVID-19 in Ecuador. medRxiv. 2020 doi: 10.1101/2020.06.11.20128330. - DOI - PMC - PubMed
    1. Laamarti M, Alouane T, Kartti S, Chemao-Elfihri M, Hakmi M, Essabbar A, Laamarti M, Hlali H, Bendani H, Boumajdi N, et al. Large scale genomic analysis of 3067 SARS-CoV-2 genomes reveals a clonal geo-distribution and a rich genetic variations of hotspots mutations. PLoS One. 2020;15(11):e0240345. doi: 10.1371/journal.pone.0240345. - DOI - PMC - PubMed
    1. Leila M, Sorayya G. Genotype and phenotype of COVID-19: their roles in pathogenesis. J Microbiol Immunol Infect. 2021;54(2):159–163. doi: 10.1016/j.jmii.2020.03.022. - DOI - PMC - PubMed
    1. Lu R, Zhao X, Li J, Niu P, Yang B, Wu H, Wang W, Song H, Huang B, Zhu N, et al. Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding. Lancet. 2020;395(10224):565. doi: 10.1016/S0140-6736(20)30251-8. - DOI - PMC - PubMed
    1. Nawaz MS, Fournier-Viger P, Shojaee A, Fujita H. Using artificial intelligence techniques for COVID-19 genome analysis. Appl Intell. 2021;51:3086–3103. doi: 10.1007/s10489-021-02193-w. - DOI - PMC - PubMed