Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Mar 22;22(2):642-663.
doi: 10.1093/bib/bbaa232.

Computational strategies to combat COVID-19: useful tools to accelerate SARS-CoV-2 and coronavirus research

Affiliations
Review

Computational strategies to combat COVID-19: useful tools to accelerate SARS-CoV-2 and coronavirus research

Franziska Hufsky et al. Brief Bioinform. .

Abstract

SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) is a novel virus of the family Coronaviridae. The virus causes the infectious disease COVID-19. The biology of coronaviruses has been studied for many years. However, bioinformatics tools designed explicitly for SARS-CoV-2 have only recently been developed as a rapid reaction to the need for fast detection, understanding and treatment of COVID-19. To control the ongoing COVID-19 pandemic, it is of utmost importance to get insight into the evolution and pathogenesis of the virus. In this review, we cover bioinformatics workflows and tools for the routine detection of SARS-CoV-2 infection, the reliable analysis of sequencing data, the tracking of the COVID-19 pandemic and evaluation of containment measures, the study of coronavirus evolution, the discovery of potential drug targets and development of therapeutic strategies. For each tool, we briefly describe its use case and how it advances research specifically for SARS-CoV-2. All tools are free to use and available online, either through web applications or public code repositories. Contact:evbc@unj-jena.de.

Keywords: SARS-CoV-2; drug design; epidemiology; sequencing; tools; virus bioinformatics.

PubMed Disclaimer

Figures

Figure 1
Figure 1
SARS-CoV-2-specific primers computed with PriSeT. Approximate amplicon locations of de novo computed primer pairs for SARS-CoV-2 with no co-occurrences in other genomes in GenBank (on 3 April 2020).
Figure 2
Figure 2
Simplified overview of the poreCov workflow. The individual workflow steps (blue) are executed automatically depending on the input (yellow). Instead of using raw nanopore fast5 files, fastq files or complete SARS-CoV-2 genomes can be used as an alternative input. If reference genomes and location/times are added, a time tree is additionally constructed.
Figure 3
Figure 3
Sequence reads from a human lung metatranscriptome (sample accession: SAMN13922059) were first quality-filtered using TrimGalore v0.6.0 and subsequently assembled using MEGAHIT v1.1.3 [53] with default parameters. The resulting metatranscriptome assembly was processed through the VIRify pipeline. Based on the hits against the ViPhOG database, a 29 kb contig was classified as Coronaviridae. Functional protein domain annotations (inner track) were assigned by an hmmsearch v3.1b2 against coronavirus models in Pfam. The image was created with circlize [29] and polished with Inkscape.
Figure 4
Figure 4
A region of recombination in coronavirus genomes at three levels of resolution in Base-By-Base. Top panel: aligned genomes; blue boxes show differences compared to top sequence in alignment. Middle panel: summary view showing differences and indels compared top sequence. Bottom panel: similarity plot comparing five genomes.
Figure 5
Figure 5
SARS-CoV-2 Rfam secondary structure predictions. The sequence is based on the NC_045512.2 RefSeq entry displayed with the wuhCor1 UCSC Genome Browser alongside the NCBI Genes track.
Figure 6
Figure 6
Overview of Covidex for viral subtyping analysis. Left: The user is expected to load a sequence file and to select the model that will be applied for classification. Models may be selected from the default list or uploaded by the user. Right: The program output (table and plots).
Figure 7
Figure 7
Web interface of the COVIDSIM simulator. The interface is allowing the user to modify model parameters and compare simulated dynamics with real infection data.
Figure 8
Figure 8
List of amino acid replacements to the SARS-CoV-2 reference sequence. Replacements have been detected in GISAID SARS-CoV-2 sequences from the pandemic using CoV-GLUE.
Figure 9
Figure 9
CoVex: CoronaVirus Explorer.CoVex is a network medicine web platform that allows its users to interactively mine a large interactome that integrates information about virus–host protein interactions, known human protein–protein interactions as well as drug–protein interactions. CoVex can be used for identifying potential drug targets and drug repurposing candidates.
Figure 10
Figure 10
P-HIPSTer combines sequence and structural information to predict viral-host PPIs. P-HIPSTer evaluates the likelihood ratio (LR) for the potential interaction between a viral protein (in red) and a human protein (in blue) combining three evidences: (i) domain–domain LR that two structure domains interact based on known complex (green and purple domain–domain complex) comprised of their structural neighbours; (ii) peptide–domain LR that an unstructured peptide in one query binds to a structured domain in the second query based on known binding motifs/peptide–domain complex (green and purple peptide–domain complex) using both sequence and structural similarity; (iii) redundancy LR based on evidence that multiple structural neighbours (in orange, purple and green) of one query protein is known to interact with the remaining query protein. Each viral protein is functionally annotated based on sequence and structural similarity (either using homology models or known protein structures) and their corresponding set of predicted interacting human proteins.

References

    1. Aguilera LU, Rodríguez-González J. Modeling the effect of tat inhibitors on HIV latency. J Theor Biol 2019;473:20–7. - PubMed
    1. Akgül A, Khoshnaw SHA, Mohammed WH. Mathematical model for the ebola virus disease. J Adv Phys 2018;7(2):190–8.
    1. Bairoch A. The ENZYME database in 2000. Nucleic Acids Res 2000;28(1):304–5. - PMC - PubMed
    1. Barry J. K.. Mathematical modelling of the HIV life cycle: identifying optimal treatment strategies. PhD thesis, University of Greifswald, 2018.
    1. Boettiger C.. An introduction to docker for reproducible research. ACM SIGOPS Operating Systems Review, 2015;49(1):71–79.

Publication types