Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Mar;28(3):277-289.
doi: 10.1261/rna.078969.121. Epub 2021 Dec 22.

sgDI-tector: defective interfering viral genome bioinformatics for detection of coronavirus subgenomic RNAs

Affiliations

sgDI-tector: defective interfering viral genome bioinformatics for detection of coronavirus subgenomic RNAs

Andrea Di Gioacchino et al. RNA. 2022 Mar.

Abstract

Coronavirus RNA-dependent RNA polymerases produce subgenomic RNAs (sgRNAs) that encode viral structural and accessory proteins. User-friendly bioinformatic tools to detect and quantify sgRNA production are urgently needed to study the growing number of next-generation sequencing (NGS) data of SARS-CoV-2. We introduced sgDI-tector to identify and quantify sgRNA in SARS-CoV-2 NGS data. sgDI-tector allowed detection of sgRNA without initial knowledge of the transcription-regulatory sequences. We produced NGS data and successfully detected the nested set of sgRNAs with the ranking M > ORF3a > N>ORF6 > ORF7a > ORF8 > S > E>ORF7b. We also compared the level of sgRNA production with other types of viral RNA products such as defective interfering viral genomes.

Keywords: SARS-CoV-2; defective viral genomes; subgenomic RNA; user-friendly bioinformatics.

PubMed Disclaimer

Figures

FIGURE 1.
FIGURE 1.
Four main classes of DI genomes can be detected by DI-tector. The full-length genome (divided here in three regions, A, B and C) is shown first. When a part of the region B is missed then DI-tector detects a deletion; if instead a region is added, DI-tector detects an insertion. Copy-backs and snap-backs are formed through junctions involving the two strands (positive-sense and negative-sense). C′ is the region complementary to C. (BP) Breakpoint site, (RI) reinitiation site.
FIGURE 2.
FIGURE 2.
Most of the DVG reads can be associated to canonical sgRNAs. Here we show the results of RNA-seq and alignment of the reads to the human and SARS-CoV-2 genomes. NGS library preparation was performed with a ribodepletion step. The unmapped reads have been further processed with DI-tector, and the resulting characterization of the DVG reads into deletions, insertions and copy-backs/snap-backs is given. The percentage of junction reads corresponding to canonical sgRNA that is standard annotated subgenomic ORFs for SARS-CoV-2 (S, 3A, E, M, 6, 7a, 7b, N, 10), is specified. All percentages are averaged over three biological replicates. Cells colors are given to classify reads in host-related reads (blue), viral reads (yellow), DVG reads (green), canonical sgRNA reads (red), other reads (gray).
FIGURE 3.
FIGURE 3.
sgDI-tector detects most canonical sgRNAs in all replicates with a high number of counts. (Left panels) Deletion DVGs distribution across the last 10 kb positions of the SARS-CoV-2 genome (GISAID ID: EPI_ISL_414631). Each blue or orange bar corresponds to a deletion, the position of the bar being the starting point of the “body” part of the junction (called RI position in sgDI-tector). Crosses are expected RI positions from Alexandersen et al. (2020), and bars are colored in orange if the deletion is observed in that position in our data. (Right panels) Number of counts and ORF name for the 13 deletions with most counts observed in each replicate. Orange bars correspond to orange crosses in the left panel and represent canonical TRS. Blue bars correspond to putative noncanonical ORFs detected in our data. Names for noncanonical ORFs are hexadecimal numbers representing the position of the corresponding start codon (AUG) in the standard 5′-to-3′ sense in the reference sequence (GISAID ID: EPI_ISL_414631), see also Supplemental Table 2.
FIGURE 4.
FIGURE 4.
Canonical sgRNAs and some noncanonical sgRNAs are consistently observed across the three biological replicates. ORF names and numbers of counts of the 20 deletions with the most counts were observed in NGS data in three biological replicates. Data from different replicates have been normalized so that the number of viral reads observed in each replicate is constant (see Materials and Methods). Bar heights are given by the average of the three replicates (shown as white dots after normalization), and error bars represent the standard deviation. The colored bars below the ORF names indicate statistical significance of the count differences; ORFs above bars of different colors have statistically different junction counts (P-value ≤0.05, from a two-sample, two-tailed, Welch's unequal-variance t-test).
FIGURE 5.
FIGURE 5.
sgDI-tector results are not correlated with decumulation results, while agreeing with other tools applied on the same and on other data. The first row (A–C) presents only results obtained in our experiments and analyzed with several bioinformatic tools, while the second row (D–F) presents a comparison between our data and other data present in the literature. All given results are for one replicate. One pseudocount has been added when necessary to visualize the same number of ORFs in each plot. The black dashed line is the diagonal line, added to ease the comparison between the different methods. Notice that the plots on the bottom row compare results on different cell lines: HEK293 on the y-axis, and Vero on the x-axis.
FIGURE 6.
FIGURE 6.
sgRNA junction counts obtained with sgDI-tector correlate with Finkel et al.’s junction counts obtained with STAR (panels A,B) while Periscope results show a lower correlation (panels C,D). Left (right) column contains the results for data at 5 (24) hpi. Only data for the first biological replicate are presented here. ORF 10 junctions are never found by both STAR and by DI-tector, while a single read has been detected by Periscope at 5 hpi. One pseudocount has been added to junctions which are not detected by one tool while being detected by the other. The black dashed line is the diagonal line, added to ease the comparison between the different methods.
FIGURE 7.
FIGURE 7.
Logo of the RI positions around the TRS-putative sequences obtained from DI-tector. The conservation plotted as total height of the letters representing nucleotides is obtained as log2 (4)–Σn fi (n) log2 (fi (n)), where fi (n) is the frequency of nucleotide n in position i. Therefore a height equal to 2 corresponds to perfect conservation. The horizontal axis is the position with respect to the reference sequence (GISAID ID: EPI_ISL_414631). The green box highlights the canonical TRS. The alignment step to obtain this logo is described in the Materials and Methods section. Color code used: red for adenine and uracil, blue for cytosine and guanine.
FIGURE 8.
FIGURE 8.
Scheme of the sgDI-tector pipeline introduced here to find the putative position of the leader sequence, sgRNAs, and a list of putative transcription-regulatory sequences (TRSs). Red boxes denote necessary inputs for the sgDI-tector tool, and green boxes denote outputs.
Andrea Di Gioacchino
Andrea Di Gioacchino

References

    1. Alexandersen S, Chamings A, Bhatta TR. 2020. SARS-CoV-2 genomic and subgenomic RNAs in diagnostic samples are not an indicator of active replication. Nat Commun 11: 6059. 10.1038/s41467-020-19883-7 - DOI - PMC - PubMed
    1. Beauclair G, Mura M, Combredet C, Tangy F, Jouvenet N, Komarova AV. 2018. DI-tector: defective interfering viral genomes’ detector for next-generation sequencing data. RNA 24: 1285–1296. 10.1261/rna.066910.118 - DOI - PMC - PubMed
    1. Davidson AD, Williamson MK, Lewis S, Shoemark D, Carroll MW, Heesom KJ, Zambon M, Ellis J, Lewis PA, Hiscox JA, et al. 2020. Characterisation of the transcriptome and proteome of SARS-CoV-2 reveals a cell passage induced in-frame deletion of the furin-like cleavage site from the spike glycoprotein. Genome Med 12: 1–15. 10.1186/s13073-020-00763-0 - DOI - PMC - PubMed
    1. Dimmock NJ, Easton AJ, Goff SP. 2014. Defective interfering influenza virus RNAs: time to reevaluate their clinical potential as broad-spectrum antivirals? J Virol 88: 5217–5227. 10.1128/JVI.03193-13 - DOI - PMC - PubMed
    1. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. 2012. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29: 15–21. 10.1093/bioinformatics/bts635 - DOI - PMC - PubMed

Publication types