Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023:21:1403-1413.
doi: 10.1016/j.csbj.2023.02.003. Epub 2023 Feb 9.

Transcriptomics secondary analysis of severe human infection with SARS-CoV-2 identifies gene expression changes and predicts three transcriptional biomarkers in leukocytes

Affiliations

Transcriptomics secondary analysis of severe human infection with SARS-CoV-2 identifies gene expression changes and predicts three transcriptional biomarkers in leukocytes

Jeffrey Clancy et al. Comput Struct Biotechnol J. 2023.

Abstract

SARS-CoV-2 is the causative agent of COVID-19, which has greatly affected human health since it first emerged. Defining the human factors and biomarkers that differentiate severe SARS-CoV-2 infection from mild infection has become of increasing interest to clinicians. To help address this need, we retrieved 269 public RNA-seq human transcriptome samples from GEO that had qualitative disease severity metadata. We then subjected these samples to a robust RNA-seq data processing workflow to calculate gene expression in PBMCs, whole blood, and leukocytes, as well as to predict transcriptional biomarkers in PBMCs and leukocytes. This process involved using Salmon for read mapping, edgeR to calculate significant differential expression levels, and gene ontology enrichment using Camera. We then performed a random forest machine learning analysis on the read counts data to identify genes that best classified samples based on the COVID-19 severity phenotype. This approach produced a ranked list of leukocyte genes based on their Gini values that includes TGFBI, TTYH2, and CD4, which are associated with both the immune response and inflammation. Our results show that these three genes can potentially classify samples with severe COVID-19 with accuracy of ∼88% and an area under the receiver operating characteristic curve of 92.6--indicating acceptable specificity and sensitivity. We expect that our findings can help contribute to the development of improved diagnostics that may aid in identifying severe COVID-19 cases, guide clinical treatment, and improve mortality rates.

Keywords: AUC, Area under the curve; Bioinformatics; Biomarkers; COVID-19; COVID-19, Coronavirus Disease of 2019; DEG, Differentially expressed gene; Data mining; GEO, Gene Expression Omnibus; GO, Gene Ontology; RNA; RNA-sequencing; ROC, Receiver-operator characteristic; SARS-CoV-2; SARS-CoV-2, Severe Acute Respiratory Syndrome Coronavirus 2; Virus.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

ga1
Graphical abstract
Fig. 1
Fig. 1
: Flow diagram to visualize the process used to filter relevant public RNA-seq records/studies and reports/samples that were included in the secondary analysis.
Fig. 2
Fig. 2
Volcano plot of all differentially expressed genes in severe vs. mild human infection with SARS-CoV-2. Genes that are up or down regulated from blood samples collected from patients having severe symptoms or mild symptoms during infection with SARS-CoV-2. Genes showing statistically significant up-regulation (blue), down-regulated (red), or no significant change (green). X-axis shows the log2 fold-change values while the y-axis displays false-discovery rate-adjusted p-values to account for multiple hypothesis testing.
Fig. 3
Fig. 3
Receiver-operator characteristic (ROC) curve constructed from all expressed genes in severe vs. mild human infection with SARS-CoV-2. Constructing a ROC curve from all RNA-sequencing read quantification values achieved an area-under-the-curve (AUC) value of greater than 96%.
Fig. 4
Fig. 4
Principal component analysis (PCA) of all samples based on available severity metadata and collected biomaterial. PCA was applied to all samples based on metadata for disease severity (A) or biomaterial type (B).

Similar articles

Cited by

References

    1. Kim D., Lee J.-Y., Yang J.-S., Kim J.W., Kim V.N., Chang H. The architecture of SARS-CoV-2 transcriptome. Cell. 2020;181:914–921. e10. - PMC - PubMed
    1. Yan R., Zhang Y., Li Y., Xia L., Guo Y., Zhou Q. Structural basis for the recognition of SARS-CoV-2 by full-length human ACE2. Science. 2020;367:1444–1448. - PMC - PubMed
    1. Evans J.P., Liu S.-L. Role of host factors in SARS-CoV-2 entry. J Biol Chem. 2021;297 - PMC - PubMed
    1. Harrison A.G., Lin T., Wang P. Mechanisms of SARS-CoV-2 transmission and pathogenesis. Trends Immunol. 2020;41:1100–1115. - PMC - PubMed
    1. Cevik M., Kuppalli K., Kindrachuk J., Peiris M. Virology, transmission, and pathogenesis of SARS-CoV-2. BMJ. 2020;371:m3862. - PubMed