Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Oct 17:11:giac094.
doi: 10.1093/gigascience/giac094.

High temporal resolution Nanopore sequencing dataset of SARS-CoV-2 and host cell RNAs

Affiliations

High temporal resolution Nanopore sequencing dataset of SARS-CoV-2 and host cell RNAs

Dóra Tombácz et al. Gigascience. .

Abstract

Background: Recent studies have disclosed the genome, transcriptome, and epigenetic compositions of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and the effect of viral infection on gene expression of the host cells. It has been demonstrated that, besides the major canonical transcripts, the viral genome also codes for noncanonical RNA molecules. While the structural characterizations have revealed a detailed transcriptomic architecture of the virus, the kinetic studies provided poor and often misleading results on the dynamics of both the viral and host transcripts due to the low temporal resolution of the infection event and the low virus/cell ratio (multiplicity of infection [MOI] = 0.1) applied for the infection. It has never been tested whether the alteration in the host gene expressions is caused by aging of the cells or by the viral infection.

Findings: In this study, we used Oxford Nanopore's direct cDNA and direct RNA sequencing methods for the generation of a high-coverage, high temporal resolution transcriptomic dataset of SARS-CoV-2 and of the primate host cells, using a high infection titer (MOI = 5). Sixteen sampling time points ranging from 1 to 96 hours with a varying time resolution and 3 biological replicates were used in the experiment. In addition, for each infected sample, corresponding noninfected samples were employed. The raw reads were mapped to the viral and to the host reference genomes, resulting in 49,661,499 mapped reads (54,62 Gbs). The genome of the viral isolate was also sequenced and phylogenetically classified.

Conclusions: This dataset can serve as a valuable resource for profiling the SARS-CoV-2 transcriptome dynamics, the virus-host interactions, and the RNA base modifications. Comparison of expression profiles of the host gene in the virally infected and in noninfected cells at different time points allows making a distinction between the effect of the aging of cells in culture and the viral infection. These data can provide useful information for potential novel gene annotations and can also be used for studying the currently available bioinformatics pipelines.

Keywords: MinION system; Oxford Nanopore Technologies; SARS-CoV-2; coronavirus; direct RNA sequencing; direct cDNA sequencing; full-length transcriptome; long-read sequencing.

PubMed Disclaimer

Conflict of interest statement

The authors declare that there are no conflicts of interest.

Figures

Figure 1:
Figure 1:
Schematic representation of the workflow applied in this project. (A) Isolation and detection of a Hungarian isolate of the SARS-CoV-2 virus. The sample was collected from a human nasopharyngeal swab. The SARS-CoV-2 infection was validated by reverse transcription PCR using the RNA extracted from the sample. The virus was isolated from the sample and was maintained on Vero cells. (B) Experimental workflow of the study. Vero cells were infected with SARS-CoV-2 and the cells were incubated at 37°C for 1, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 24, 36, 48, 72, and 96 hours post infection. Uninfected control cells were also propagated. Each time-point experiment was carried out in 3 biological replicates. RNAs were purified from the samples, which was followed by the preparation of libraries and then sequencing using direct cDNA and direct RNA methods. Altogether, 9 MinION flow cells (ONT) were used for this study. (C) Bioinformatics workflow. The ONT's Guppy basecaller was used to identify the base sequence of the obtained reads, and then they were aligned to the viral and host reference genomes by using the minimap2 mapper. Statistical data were generated with seqtools [25] and a custom R-workflow [33]. (D) Quality of RNA samples was detected with a TapeStation 2200 system with RNA ScreenTape. TapeStation gel image shows that intact, high-quality RNAs were isolated from the samples and used for sequencing. The image shows the following samples: A1: marker; B1: 8-hour postinfection (pi) sample C; 12-hour pi sample A; 16-hour pi sample A; 18-hour pi sample B, 20-hour pi sample C; 36-hour pi sample A; 48-hour pi sample A; 96-hour pi sample B.
Figure 2:
Figure 2:
Ratio of the sgRNAs to the gRNAs across the viral infection cycle in the dcDNA samples. The fitted loess function with 95% confidence intervals is shown in blue and gray, respectively.
Figure 3:
Figure 3:
Phylogenetic tree displays the sequenced SARS-CoV-2 strains, according to the designated clades of the virus. Our isolate is colored red, and a red arrow shows the position of our own isolate documented in the current study (OM812693.1). The position of the genome that was used as reference for aligning the reads (MT560672.1) is also indicated by a red arrow. The tree was generated by the Nextstrain pipeline. All variants are colored by their assigned clade, according to the nomenclature.
Figure 4:
Figure 4:
Whole-genome coverage plot using high-quality (Q-score ≥8) reads from dcDNA samples that aligned to the SARS-CoV-2 genome used as a reference for this study. The coverages of the replicates from each hours post infection (hpi) group were summed, and the y-axes show the log2 of these values. Annotated protein-coding genes are shown at the bottom track. Direction of arrows depicts the coding strand.
Figure 5:
Figure 5:
Scatterplot of mean read lengths of the sequencing data derived from infected and uninfected samples, with 25th and 75th percentiles and a fitted loess function. (A) Length of reads aligned to the viral (B) and to the host genome. (C) Read-length distribution of mock-infected samples mapped to the host genome.

Similar articles

Cited by

References

    1. Zhou P, Yang X-L, Wang X-G, et al. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature. 2020;579(7798):270–3. - PMC - PubMed
    1. Woo PC, Huang Y, Lau SK, et al. Coronavirus genomics and bioinformatics analysis. Viruses. 2010;2(8):1804–20. - PMC - PubMed
    1. Sola I, Moreno JL, Zúñiga S, et al. Role of nucleotides immediately flanking the transcription-regulating sequence core in coronavirus subgenomic mRNA synthesis. J Virol. 2005;79(4):2506–16. - PMC - PubMed
    1. Hussain S, Pan J, Chen Y, et al. Identification of novel subgenomic RNAs and noncanonical transcription initiation signals of severe acute respiratory syndrome coronavirus. J Virol. 2005;79(9):5288–95. - PMC - PubMed
    1. Sola I, Almazán F, Zúñiga S, et al. Continuous and discontinuous RNA synthesis in coronaviruses. Annu Rev Virol. 2015;2(1):265–88. - PMC - PubMed

Publication types