Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 May 17;4(1):590.
doi: 10.1038/s42003-021-02095-0.

Genome-wide bioinformatic analyses predict key host and viral factors in SARS-CoV-2 pathogenesis

Affiliations

Genome-wide bioinformatic analyses predict key host and viral factors in SARS-CoV-2 pathogenesis

Mariana G Ferrarini et al. Commun Biol. .

Abstract

The novel betacoronavirus severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) caused a worldwide pandemic (COVID-19) after emerging in Wuhan, China. Here we analyzed public host and viral RNA sequencing data to better understand how SARS-CoV-2 interacts with human respiratory cells. We identified genes, isoforms and transposable element families that are specifically altered in SARS-CoV-2-infected respiratory cells. Well-known immunoregulatory genes including CSF2, IL32, IL-6 and SERPINA3 were differentially expressed, while immunoregulatory transposable element families were upregulated. We predicted conserved interactions between the SARS-CoV-2 genome and human RNA-binding proteins such as the heterogeneous nuclear ribonucleoprotein A1 (hnRNPA1) and eukaryotic initiation factor 4 (eIF4b). We also identified a viral sequence variant with a statistically significant skew associated with age of infection, that may contribute to intracellular host-pathogen interactions. These findings can help identify host mechanisms that can be targeted by prophylactics and/or therapeutics to reduce the severity of COVID-19.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Overview of the bioinformatic workflow applied in this study.
As indicated in orange, RNA-seq data from SARS-CoV-2-infected samples were used as the input to identify differentially expressed (DE) genes, isoforms, and transposable elements (TEs). DE genes were used to identify functional enrichment of deregulated genes and possible impacts on metabolism. Neighboring genes of differentially expressed TEs (DETEs) were analyzed to verify if TEs could serve as regulatory mechanisms of gene expression. In green, the complete genome of the SARS-CoV-2 virus was used to identify enrichment of RNA-binding protein (RBP) motifs. We also used all available sequenced genomes as of 11 November 2020, to detect conserved RBP motifs and possible links to disease severity.
Fig. 2
Fig. 2. Overview of the RNA-seq-based results specific to SARS-CoV-2, which were not detected in the other viral infections (IAV, HPIV3, and RSV).
a Representation of the RNA-seq studies used in our analyses. b A subset of non-redundant reduced terms consistently enriched in more than one SARS-CoV-2 cell line, which were not detected in the other viruses’ datasets. c Top 20 differentially expressed isoforms (DEIs) in SARS-CoV-2-infected samples. The y-axis denotes the differential usage of isoforms between conditions (i.e., difference in isoform fraction, dIF), whereas the x-axis represents the overall log2FC of the corresponding gene. Thus, DEIs also detected as differentially expressed genes (DEGs) by this analysis are depicted in blue. d Different manners by which transposable element (TE) family overexpression might be detected. Although TEs may be autonomously expressed, the old age of most TEs detected points toward either being part of a gene (exonization or alternative promoter) or a result of pervasive transcription. We report the functional enrichment for neighboring genes of differentially expressed TEs (DETEs) specifically upregulated in SARS-CoV-2 Calu-3 and A549 cells (MOI 2). Source data for Fig. 2 is provided in Supplementary Data 18.
Fig. 3
Fig. 3. Isoform usage of IL-6 transcripts in SARS-CoV-2-infected cells.
a IL6-204 is the major IL-6 transcript and is composed of 6 exons, five (E2, E3, E4, E5, E6) containing coding sequences (CDS) and one (E1) containing exclusively a 5′-untranslated region (5′-UTR). Both isoforms (IL6-204 and IL6-201) have the same protein-coding capability. The main difference between them is the absence of E1 in IL6-201, which is the major induced isoform upon SARS-CoV-2 infection. b Gene expression of IL-6 in all SARS-CoV-2 cell line samples (A549 multiplicity of infection (MOI) 0.2 and 2; Calu-3 and NHBE MOI 2). Each boxplot represents three biological replicates and statistical testing was performed with DESeq2 (detailed in “Methods” section). Exact p-values are available in Supplementary Data 2. c Isoform usage switch between both isoforms in SARS-CoV-2-infected cell line samples. This figure shows that IL6-204 is almost exclusively expressed in uninfected (mock) cells, whereas IL6-201 is almost exclusively expressed in SARS-CoV-2-infected cells. Each boxplot represents three biological replicates and statistical testing was performed with IsoformSwitchAnalyzeR and exact q-values are available in Supplementary Data 8. Source data for Fig. 3 is provided in Supplementary Data 18.
Fig. 4
Fig. 4. Workflow and selected results for the analysis of potential binding sites for human RNA-binding proteins (RBPs) in the SARS-CoV-2 genome.
In orange, human RNA-binding protein (RBP) position weight matrices (PWMs) from the ATtRACT database were used as input to search for putative binding sites in the SARS-CoV-2 virus genome (green). Binding motifs of several RBPs were detected to be enriched/depleted within the positive-strand genome (containing genes, 5′- and 3′-untranslated regions (UTRs), and intergenic regions) and the negative-sense intermediates. Conserved RBP-binding sites were determined from the multiple sequence alignment of ~180k SARS-CoV-2 genomes available from GISAID. Finally, we included information from human gene expression data and protein–protein interaction networks for human and SARS-CoV-2 that are publicly available.
Fig. 5
Fig. 5. Overview of human factors specific to SARS-CoV-2 infection detected by our analyses.
This figure includes human RNA-binding proteins (RBPs), whose binding sites are enriched and conserved in the SARS-CoV-2 genome but not in the genomes of related viruses, and gene isoforms and metabolites that are consistently altered in response to SARS-CoV-2 infection of lung epithelial cells but not in infection with the other tested viruses. ECM: extracellular matrix.

Similar articles

Cited by

References

    1. Huang C, et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet. 2020;395:497–506. doi: 10.1016/S0140-6736(20)30183-5. - DOI - PMC - PubMed
    1. Andersen KG, Rambaut A, Lipkin WI, Holmes EC, Garry RF. The proximal origin of SARS-CoV-2. Nat. Med. 2020;26:450–452. doi: 10.1038/s41591-020-0820-9. - DOI - PMC - PubMed
    1. WHO Coronavirus (COVID-19) Dashboard. https://covid19.who.int.
    1. Tsang KW, et al. A cluster of cases of severe acute respiratory syndrome in Hong Kong. N. Engl. J. Med. 2003;348:1977–1985. doi: 10.1056/NEJMoa030666. - DOI - PubMed
    1. Zaki AM, van Boheemen S, Bestebroer TM, Osterhaus ADME, Fouchier RAM. Isolation of a novel coronavirus from a man with pneumonia in Saudi Arabia. N. Engl. J. Med. 2012;367:1814–1820. doi: 10.1056/NEJMoa1211721. - DOI - PubMed

MeSH terms