Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Apr 5;13(1):5599.
doi: 10.1038/s41598-023-32268-2.

A comprehensive analysis of gene expression profiling data in COVID-19 patients for discovery of specific and differential blood biomarker signatures

Affiliations

A comprehensive analysis of gene expression profiling data in COVID-19 patients for discovery of specific and differential blood biomarker signatures

Maryam Momeni et al. Sci Rep. .

Abstract

COVID-19 is a newly recognized illness with a predominantly respiratory presentation. Although initial analyses have identified groups of candidate gene biomarkers for the diagnosis of COVID-19, they have yet to identify clinically applicable biomarkers, so we need disease-specific diagnostic biomarkers in biofluid and differential diagnosis in comparison with other infectious diseases. This can further increase knowledge of pathogenesis and help guide treatment. Eight transcriptomic profiles of COVID-19 infected versus control samples from peripheral blood (PB), lung tissue, nasopharyngeal swab and bronchoalveolar lavage fluid (BALF) were considered. In order to find COVID-19 potential Specific Blood Differentially expressed genes (SpeBDs), we implemented a strategy based on finding shared pathways of peripheral blood and the most involved tissues in COVID-19 patients. This step was performed to filter blood DEGs with a role in the shared pathways. Furthermore, nine datasets of the three types of Influenza (H1N1, H3N2, and B) were used for the second step. Potential Differential Blood DEGs of COVID-19 versus Influenza (DifBDs) were found by extracting DEGs involved in only enriched pathways by SpeBDs and not by Influenza DEGs. Then in the third step, a machine learning method (a wrapper feature selection approach supervised by four classifiers of k-NN, Random Forest, SVM, Naïve Bayes) was utilized to narrow down the number of SpeBDs and DifBDs and find the most predictive combination of them to select COVID-19 potential Specific Blood Biomarker Signatures (SpeBBSs) and COVID-19 versus influenza Differential Blood Biomarker Signatures (DifBBSs), respectively. After that, models based on SpeBBSs and DifBBSs and the corresponding algorithms were built to assess their performance on an external dataset. Among all the extracted DEGs from the PB dataset (from common PB pathways with BALF, Lung and Swab), 108 unique SpeBD were obtained. Feature selection using Random Forest outperformed its counterparts and selected IGKC, IGLV3-16 and SRP9 among SpeBDs as SpeBBSs. Validation of the constructed model based on these genes and Random Forest on an external dataset resulted in 93.09% Accuracy. Eighty-three pathways enriched by SpeBDs and not by any of the influenza strains were identified, including 87 DifBDs. Using feature selection by Naive Bayes classifier on DifBDs, FMNL2, IGHV3-23, IGLV2-11 and RPL31 were selected as the most predictable DifBBSs. The constructed model based on these genes and Naive Bayes on an external dataset was validated with 87.2% accuracy. Our study identified several candidate blood biomarkers for a potential specific and differential diagnosis of COVID-19. The proposed biomarkers could be valuable targets for practical investigations to validate their potential.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
A workflow representing the main steps of the present study. Designed using diagram.net online tool available at https://app.diagrams.net/.
Figure 2
Figure 2
Common Pathways of PB with BALF, Lung, and Swab, their adjusted p-values in pathway enrichment analysis, and the list of extracted SpeBDs from them. The figure is generated using RStudio version 2022.12.0 and Adobe Illustrator version 24.2.1.
Figure 3
Figure 3
Extraction of SpeBDs from PB DEGs of COVID-19 patients with the help of the common pathways between PB and the three sources from the respiratory system of COVID-19 patients (Swab, BALF, and Lung). A whole list of SpeBDs is indicated in this figure. Lung, Lung tissue biopsy; Swab, nasopharyngeal swab; BALF, bronchoalveolar lavage fluid; PB, peripheral blood. The figure is created using Cytoscape version 3.8.2 and Illustrator version 24.2.1
Figure 4
Figure 4
(A) Venn diagram representing the pathways enriched by SpeBDs, Influenza H1N1 PB DEGs, Influenza H3N2 PB DEGs, and Influenza B PB DEGs constructed using an online tool available at https://bioinformatics.psb.ugent.be/webtools/Venn/. The red circle mentions pathways that were enriched by SpeBDs and not by the three Influenza types; these pathways are listed in part B: Eighty-three pathways were obtained from pathway enrichment analysis of SpeBDs and were different from pathways obtained by pathway enrichment analysis of Influenza H1N1, H3N2, and B DEGs; (B) is created using RStudio version 2022.12.0 and Adobe Illustrator version 24.2.1.
Figure 5
Figure 5
The ten-fold cross-validation results of the feature selection method in choosing SpeBBSs and the constructed machine learning models; (A) ROC curves representing classification ability of the feature selection method by the four classifiers on GSE166190 dataset (the feature selection dataset); (B) ROC curves representing classification powers of the constructed models based on the selected SpeBBSs and corresponding algorithms (the same algorithms that were applied in feature selection step) on Bibert et al.’s dataset A (the validation dataset). These ROC curves show ROC (red lines) at various threshold settings (blue lines). In the ROC curves, the x-axis shows 1-specificity, and the y-axis shows sensitivity. (C) Four measures indicating the classification power of the feature selection method by the four classifiers on GSE166190 dataset (the feature selection dataset); (D) Four measures indicating the power of constructed models based on the selected SpeBBSs and the corresponding algorithms (the same algorithms that were applied in feature selection step) on Bibert et al.’s dataset A (the validation dataset). FS: feature selection.
Figure 6
Figure 6
The ten-fold cross-validation results of the feature selection method in choosing DifBBSs and the constructed machine learning models; (A) ROC curves representing classification ability of the feature selection method by the four classifiers on GSE161731-B dataset (the feature selection dataset); (B) ROC curves representing classification powers of the constructed models based on the selected DifBBSs and corresponding algorithms (the same algorithms that were applied in feature selection step) on Bibert et al.’s dataset-B (the validation dataset). These ROC curves show ROC (red lines) at various threshold settings (blue lines). In the ROC curves, the x-axis shows 1-specificity, and the y-axis shows sensitivity. (C) Four measures indicating the classification power of the feature selection method by the four classifiers on GSE161731-B dataset (the feature selection dataset); (D) Four measures indicating the power of constructed models based on the selected DifBBSs and the corresponding algorithms (the same algorithms that were applied in feature selection step) on Bibert et al.’s dataset B (the validation dataset). FS: feature selection.

References

    1. Al-Awwal N, Dweik F, Mahdi S, El-Dweik M, Anderson SH. A review of SARS-CoV-2 disease (COVID-19): Pandemic in our time. Pathogens. 2022;11(3):368. doi: 10.3390/pathogens11030368. - DOI - PMC - PubMed
    1. Kim D, Quinn J, Pinsky B, Shah NH, Brown I. Rates of co-infection between SARS-CoV-2 and other respiratory pathogens. JAMA. 2020;323(20):2085–2086. doi: 10.1001/jama.2020.6266. - DOI - PMC - PubMed
    1. Dadashi M, Khaleghnejad S, Abedi Elkhichi P, Goudarzi M, Goudarzi H, Taghavi A, et al. COVID-19 and influenza co-infection: A systematic review and meta-analysis. Front. Med. 2021;8:681469. doi: 10.3389/fmed.2021.681469. - DOI - PMC - PubMed
    1. Zhu, N., Zhang, D., Wang, W., Li, X., Yang, B. & Song, J. et al. A novel coronavirus from patients with pneumonia in China, 2019. N. Engl. J. Med. (2020). - PMC - PubMed
    1. Huang SS, Banner D, Fang Y, Ng DC, Kanagasabai T, Kelvin DJ, et al. Comparative analyses of pandemic H1N1 and seasonal H1N1, H3N2, and influenza B infections depict distinct clinical pictures in ferrets. PLoS ONE. 2011;6(11):e27512. doi: 10.1371/journal.pone.0027512. - DOI - PMC - PubMed