Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 May 29;25(11):5920.
doi: 10.3390/ijms25115920.

Identification of Marker Genes in Infectious Diseases from ScRNA-seq Data Using Interpretable Machine Learning

Affiliations

Identification of Marker Genes in Infectious Diseases from ScRNA-seq Data Using Interpretable Machine Learning

Gustavo Sganzerla Martinez et al. Int J Mol Sci. .

Abstract

A common result of infection is an abnormal immune response, which may be detrimental to the host. To control the infection, the immune system might undergo regulation, therefore producing an excess of either pro-inflammatory or anti-inflammatory pathways that can lead to widespread inflammation, tissue damage, and organ failure. A dysregulated immune response can manifest as changes in differentiated immune cell populations and concentrations of circulating biomarkers. To propose an early diagnostic system that enables differentiation and identifies the severity of immune-dysregulated syndromes, we built an artificial intelligence tool that uses input data from single-cell RNA sequencing. In our results, single-cell transcriptomics successfully distinguished between mild and severe sepsis and COVID-19 infections. Moreover, by interpreting the decision patterns of our classification system, we identified that different immune cells upregulating or downregulating the expression of the genes CD3, CD14, CD16, FOSB, S100A12, and TCRɣδ can accurately differentiate between different degrees of infection. Our research has identified genes of significance that effectively distinguish between infections, offering promising prospects as diagnostic markers and providing potential targets for therapeutic intervention.

Keywords: artificial intelligence; marker genes; sepsis; single-cell RNA sequencing.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

Figure 1
Figure 1
XGBoost classification and SHAP analysis of genes responsible for characterizing sepsis, septic shock, and moderate and severe COVID-19. In (A), we show the Area Under the Curve (AUC) score of an XGBoost multiclass classifier whose input is the expression level of immunological cells expressing 20 genes (i.e., FOSB, IGLC3, S100A9, S100A10, CD56, IFITM3, CD52, IL32, CD74, S100A12, HLA-A, CD24, CD8, CD14, CD19, TCRɣδ, CD3, TLR2, CD16, and CD4) mapped to a disease model (i.e., sepsis, septic shock, moderate COVID-19, and severe COVID-19). In (BE), we show the SHAP explanation for each input gene in classifying sepsis, mild COVID-19, septic shock, and severe COVID-19. The results displayed in the SHAP plots consist of an input gene (left y-axis) as per the feature importance for a particular class, the expression of the gene (right y-axis), and how this level of expression contributed to assigning a SHAP value (x-axis). Expression levels whose SHAP value is positive are significant descriptors of a class (i.e., disease model). Finally, in (F), we quantified the raw expression value of a subset containing the first five genes that best described each class.
Figure 2
Figure 2
The study compared hub gene expression in mild (A,B) and severe cases (C,D) of viral- and bacterial-induced sepsis, respectively. We processed the data using a Seurat object and analyzed it with BBrowserX (version V.25) and created the visualization in BioVinci from BioTuring. https://bioturing.com. We found that certain genes, like CD16:3G8|FCGR3A|AHS0053|PABO, were more expressed in mild COVID-19 patients and sepsis, while CD3:SK7|CD3E|AHS0033|PABO showed higher levels in septic shock and severe COVID-19 cases. S100A12 had low expression in mild COVID-19 cases and none in severe cases, making it a potential target in viral-induced diseases. TCR-GAMMA_DELTA:B1|TRD_TRG|AHS0015|PABO was upregulated in severe COVID-19 cases, particularly in dendritic cells, suggesting its role as a classifier in critically ill COVID-19 patients.
Figure 3
Figure 3
XGBoost classification of cell types and the interpretation of the classifier. In (A), we show the AUC plots of an XGBoost multiclass classifier using the transcripts of single cells to classify their corresponding type in a one-versus-all approach. The classes with an AUC equal to or higher than 0.8 were selected to have their model interpreted. In (BI), we show the contribution of input features (i.e., transcripts) was accounted for by the assignment of B cell, dendritic, monocyte classical, monocyte nonclassical, natural killer, TCD4 memory, TCD8 naïve, and T gamma delta, respectively.
Figure 4
Figure 4
Gene ontology analysis of the genes CD3, S100A12, FOSB, CD14, and CD16. In (a), we show the biological processes mapped to the candidate genes. In (b), we show the molecular functions. The cellular components and enriched KEGG pathways are shown in (c,d), respectively. The results of the GO analysis are presented in the dot plots, and the GO terms were shortlisted based on the FDR values.
Figure 5
Figure 5
UMAP plots. A total of 19,002 cells from sepsis patients, 6268 cells from mild COVID-19 patients, 4276 cells from severe COVID-19 patients, and 8186 cells from septic shock patients expressing twenty immunologically associated gene transcripts (i.e., FOSB, IGLC3, S100A9, S100A10, CD56, IFITM3, CD52, IL32, CD74, S100A12, HLA-A, CD24, CD8, CD14, CD19, TCRɣδ, CD3, TLR2, CD16, and CD4) were mapped and shown in a two-dimensional plot. We separated different clinical information to elicit differentiated behavior in cells expressing transcripts, as seen in (A), which shows the cells under different medical conditions. (B) shows the cell types separated. (C) indicates whether the cell originated from a mild or severe patient. (D) conveys whether the patient’s infection is of a viral (mild and severe COVID-19) or non-viral (sepsis and septic shock) nature.

Similar articles

Cited by

References

    1. Hotchkiss R.S., Moldawer L.L., Opal S.M., Reinhart K., Turnbull I.R., Vincent J.-L. Sepsis and septic shock. Nat. Rev. Dis. Prim. 2016;2:16045. doi: 10.1038/nrdp.2016.45. - DOI - PMC - PubMed
    1. Singer M., Deutschman C.S., Seymour C.W., Shankar-Hari M., Annane D., Bauer M., Bellomo R., Bernard G.R., Chiche J.-D., Coopersmith C.M., et al. The third international consensus definitions for sepsis and septic shock (sepsis-3) JAMA. 2016;315:801–810. doi: 10.1001/jama.2016.0287. - DOI - PMC - PubMed
    1. Bauer M., Gerlach H., Vogelmann T., Preissing F., Stiefel J., Adam D. Mortality in sepsis and septic shock in Europe, North America and Australia between 2009 and 2019—Results from a systematic review and meta-analysis. Crit. Care. 2020;24:239. doi: 10.1186/s13054-020-02950-2. - DOI - PMC - PubMed
    1. Rudd K.E., Johnson S.C., Agesa K.M., Shackelford K.A., Tsoi D., Kievlan D.R., Colombara D.V., Ikuta K.S., Kissoon N., Finfer S., et al. Global, regional, and national sepsis incidence and mortality, 1990–2017: Analysis for the Global Burden of Disease Study. Lancet. 2020;395:200–211. doi: 10.1016/s0140-6736(19)32989-7. - DOI - PMC - PubMed
    1. Bermejo-Martin J.F., Gonzalez-Rivera M., Almansa R., Micheloud D., Tedim A.P., Dominguez-Gil M., Resino S., Martin-Fernandez M., Murua P.R., Perez-Garcia F., et al. Viral RNA load in plasma is associated with critical illness and a dysregulated host response in COVID-19. Crit. Care. 2020;24:691. doi: 10.1186/s13054-020-03398-0. - DOI - PMC - PubMed