Review

. 2021 Mar 22;22(2):642-663.

doi: 10.1093/bib/bbaa232.

Computational strategies to combat COVID-19: useful tools to accelerate SARS-CoV-2 and coronavirus research

Franziska Hufsky¹, Kevin Lamkiewicz¹, Alexandre Almeida², Abdel Aouacheria³, Cecilia Arighi⁴, Alex Bateman⁵, Jan Baumbach⁶, Niko Beerenwinkel⁷, Christian Brandt⁸, Marco Cacciabue⁹, Sara Chuguransky¹⁰, Oliver Drechsel¹¹, Robert D Finn¹², Adrian Fritz¹³, Stephan Fuchs¹¹, Georges Hattab¹⁴, Anne-Christin Hauschild¹⁵, Dominik Heider¹⁶, Marie Hoffmann¹⁷, Martin Hölzer¹⁸, Stefan Hoops¹⁹, Lars Kaderali²⁰, Ioanna Kalvari²¹, Max von Kleist¹¹, Renó Kmiecinski¹¹, Denise Kühnert²², Gorka Lasso²³, Pieter Libin²⁴, Markus List⁶, Hannah F Löchel¹⁵, Maria J Martin²⁵, Roman Martin¹⁵, Julian Matschinske²⁶, Alice C McHardy²⁷, Pedro Mendes²⁸, Jaina Mistry²⁵, Vincent Navratil²⁹, Eric P Nawrocki³⁰, Áine Niamh O'Toole³¹, Nancy Ontiveros-Palacios²⁵, Anton I Petrov²⁵, Guillermo Rangel-Pineros³², Nicole Redaschi³³, Susanne Reimering³⁴, Knut Reinert¹⁷, Alejandro Reyes³⁵, Lorna Richardson³⁶, David L Robertson³⁷, Sepideh Sadegh³⁸, Joshua B Singer³⁹, Kristof Theys⁴⁰, Chris Upton⁴¹, Marius Welzel¹⁵, Lowri Williams⁴², Manja Marz¹⁸

Affiliations

¹ Friedrich-Schiller-University Jena, Germany.
² EMBL-EBI and the Wellcome Sanger Institute, UK.
³ CNRS, France.
⁴ Biocuration and Literature Access at PIR, USA.
⁵ Protein Sequence Resources at EMBL-EBI, UK.
⁶ Technical University of Munich, Germany.
⁷ Computational Biology at ETH Zurich, Switzerland.
⁸ Institute of Infectious Disease and Infection Control at Jena University Hospital, Germany.
⁹ Consejo Nacional de Investigaciones Científicas y Tócnicas (CONICET) working on FMDV virology at the Instituto de Agrobiotecnología y Biología Molecular (IABiMo, INTA-CONICET) and at the Departamento de Ciencias Básicas, Universidad Nacional de Luján (UNLu), Argentina.
¹⁰ Pfam and InterPro databases, at the EMBL-EBI, UK.
¹¹ bioinformatics department at the Robert Koch-Institute, Germany.
¹² Pfam and MGnify.
¹³ Computational Biology of Infection Research group of Alice C. McHardy at the Helmholtz Centre for Infection Research, Germany.
¹⁴ Bioinformatics Division at Philipps-University Marburg, Germany.
¹⁵ Philipps-University Marburg, Germany.
¹⁶ Data Science in Biomedicine at the Philipps-University of Marburg, Germany.
¹⁷ Freie Universität Berlin, Germany.
¹⁸ Friedrich Schiller University Jena, Germany.
¹⁹ Biocomplexity Institute and Initiative at the University of Virginia, USA.
²⁰ Bioinformatics and head of the Institute of Bioinformatics at University Medicine Greifswald, Germany.
²¹ Senior Software Developer.
²² Max Planck Institute for the Science of Human History.
²³ Chandran Lab, Albert Einstein College of Medicine, USA.
²⁴ University of Hasselt, Belgium.
²⁵ EMBL-EBI, UK.
²⁶ Chair of Experimental Bioinformatics at TU Munich, Germany.
²⁷ Computational Biology of Infection Research Lab at the Helmholtz Centre for Infection Research in Braunschweig, Germany.
²⁸ Center for Quantitative Medicine of the University of Connecticut School of Medicine, USA.
²⁹ Bioinformatics and Systems Biology at the Rhône Alpes Bioinformatics core facility, Universitó de Lyon, France.
³⁰ National Center for Biotechnology Information (NCBI).
³¹ Rambaut group at Edinburgh University, UK.
³² GLOBE Institute in the University of Copenhagen, Denmark.
³³ Development of the Swiss-Prot group at the SIB for UniProt and SIB resources that cover viral biology (ViralZone).
³⁴ Computational Biology of Infection Research group of Alice C. McHardy at the Helmholtz Centre for Infection Research.
³⁵ Universidad de los Andes, Colombia.
³⁶ Sequence Families team at EMBL-EBI, UK.
³⁷ MRC-University of Glasgow Centre for Virus Research, UK.
³⁸ Chair of Experimental Bioinformatics at Technical University of Munich, Germany.
³⁹ MRC-University of Glasgow Centre for Virus Research, Glasgow, Scotland, UK.
⁴⁰ Rega institute of the University of Leuven, Belgium.
⁴¹ Department of Biochemistry and Microbiology, University of Victoria, Canada.
⁴² Pfam and InterPro databases, at EMBL-EBI, UK.

PMID: 33147627
PMCID: PMC7665365
DOI: 10.1093/bib/bbaa232

Review

Computational strategies to combat COVID-19: useful tools to accelerate SARS-CoV-2 and coronavirus research

Franziska Hufsky et al. Brief Bioinform. 2021.

. 2021 Mar 22;22(2):642-663.

doi: 10.1093/bib/bbaa232.

Authors

Affiliations

¹ Friedrich-Schiller-University Jena, Germany.
² EMBL-EBI and the Wellcome Sanger Institute, UK.
³ CNRS, France.
⁴ Biocuration and Literature Access at PIR, USA.
⁵ Protein Sequence Resources at EMBL-EBI, UK.
⁶ Technical University of Munich, Germany.
⁷ Computational Biology at ETH Zurich, Switzerland.
⁸ Institute of Infectious Disease and Infection Control at Jena University Hospital, Germany.
⁹ Consejo Nacional de Investigaciones Científicas y Tócnicas (CONICET) working on FMDV virology at the Instituto de Agrobiotecnología y Biología Molecular (IABiMo, INTA-CONICET) and at the Departamento de Ciencias Básicas, Universidad Nacional de Luján (UNLu), Argentina.
¹⁰ Pfam and InterPro databases, at the EMBL-EBI, UK.
¹¹ bioinformatics department at the Robert Koch-Institute, Germany.
¹² Pfam and MGnify.
¹³ Computational Biology of Infection Research group of Alice C. McHardy at the Helmholtz Centre for Infection Research, Germany.
¹⁴ Bioinformatics Division at Philipps-University Marburg, Germany.
¹⁵ Philipps-University Marburg, Germany.
¹⁶ Data Science in Biomedicine at the Philipps-University of Marburg, Germany.
¹⁷ Freie Universität Berlin, Germany.
¹⁸ Friedrich Schiller University Jena, Germany.
¹⁹ Biocomplexity Institute and Initiative at the University of Virginia, USA.
²⁰ Bioinformatics and head of the Institute of Bioinformatics at University Medicine Greifswald, Germany.
²¹ Senior Software Developer.
²² Max Planck Institute for the Science of Human History.
²³ Chandran Lab, Albert Einstein College of Medicine, USA.
²⁴ University of Hasselt, Belgium.
²⁵ EMBL-EBI, UK.
²⁶ Chair of Experimental Bioinformatics at TU Munich, Germany.
²⁷ Computational Biology of Infection Research Lab at the Helmholtz Centre for Infection Research in Braunschweig, Germany.
²⁸ Center for Quantitative Medicine of the University of Connecticut School of Medicine, USA.
²⁹ Bioinformatics and Systems Biology at the Rhône Alpes Bioinformatics core facility, Universitó de Lyon, France.
³⁰ National Center for Biotechnology Information (NCBI).
³¹ Rambaut group at Edinburgh University, UK.
³² GLOBE Institute in the University of Copenhagen, Denmark.
³³ Development of the Swiss-Prot group at the SIB for UniProt and SIB resources that cover viral biology (ViralZone).
³⁴ Computational Biology of Infection Research group of Alice C. McHardy at the Helmholtz Centre for Infection Research.
³⁵ Universidad de los Andes, Colombia.
³⁶ Sequence Families team at EMBL-EBI, UK.
³⁷ MRC-University of Glasgow Centre for Virus Research, UK.
³⁸ Chair of Experimental Bioinformatics at Technical University of Munich, Germany.
³⁹ MRC-University of Glasgow Centre for Virus Research, Glasgow, Scotland, UK.
⁴⁰ Rega institute of the University of Leuven, Belgium.
⁴¹ Department of Biochemistry and Microbiology, University of Victoria, Canada.
⁴² Pfam and InterPro databases, at EMBL-EBI, UK.

PMID: 33147627
PMCID: PMC7665365
DOI: 10.1093/bib/bbaa232

Abstract

SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) is a novel virus of the family Coronaviridae. The virus causes the infectious disease COVID-19. The biology of coronaviruses has been studied for many years. However, bioinformatics tools designed explicitly for SARS-CoV-2 have only recently been developed as a rapid reaction to the need for fast detection, understanding and treatment of COVID-19. To control the ongoing COVID-19 pandemic, it is of utmost importance to get insight into the evolution and pathogenesis of the virus. In this review, we cover bioinformatics workflows and tools for the routine detection of SARS-CoV-2 infection, the reliable analysis of sequencing data, the tracking of the COVID-19 pandemic and evaluation of containment measures, the study of coronavirus evolution, the discovery of potential drug targets and development of therapeutic strategies. For each tool, we briefly describe its use case and how it advances research specifically for SARS-CoV-2. All tools are free to use and available online, either through web applications or public code repositories. Contact:evbc@unj-jena.de.

Keywords: SARS-CoV-2; drug design; epidemiology; sequencing; tools; virus bioinformatics.

PubMed Disclaimer

Figures

**Figure 1**
**SARS-CoV-2-specific primers computed with PriSeT.** Approximate amplicon locations of *de novo* computed primer pairs for SARS-CoV-2 with no co-occurrences in other genomes in GenBank (on 3 April 2020).

**Figure 2**
**Simplified overview of the poreCov workflow.** The individual workflow steps (blue) are executed automatically depending on the input (yellow). Instead of using raw nanopore fast5 files, fastq files or complete SARS-CoV-2 genomes can be used as an alternative input. If reference genomes and location/times are added, a time tree is additionally constructed.

**Figure 3**
**Sequence reads from a human lung metatranscriptome** (sample accession: SAMN13922059) were first quality-filtered using TrimGalore v0.6.0 and subsequently assembled using MEGAHIT v1.1.3 [53] with default parameters. The resulting metatranscriptome assembly was processed through the VIRify pipeline. Based on the hits against the ViPhOG database, a 29 kb contig was classified as *Coronaviridae*. Functional protein domain annotations (inner track) were assigned by an hmmsearch v3.1b2 against coronavirus models in Pfam. The image was created with circlize [29] and polished with Inkscape.

**Figure 4**
**A region of recombination in coronavirus genomes at three levels of resolution in Base-By-Base.** Top panel: aligned genomes; blue boxes show differences compared to top sequence in alignment. Middle panel: summary view showing differences and indels compared top sequence. Bottom panel: similarity plot comparing five genomes.

**Figure 5**
**SARS-CoV-2 Rfam secondary structure predictions.** The sequence is based on the NC_045512.2 RefSeq entry displayed with the wuhCor1 UCSC Genome Browser alongside the NCBI Genes track.

**Figure 6**
**Overview of Covidex for viral subtyping analysis.** Left: The user is expected to load a sequence file and to select the model that will be applied for classification. Models may be selected from the default list or uploaded by the user. Right: The program output (table and plots).

**Figure 7**
**Web interface of the COVIDSIM simulator**. The interface is allowing the user to modify model parameters and compare simulated dynamics with real infection data.

**Figure 8**
**List of amino acid replacements to the SARS-CoV-2 reference sequence.** Replacements have been detected in GISAID SARS-CoV-2 sequences from the pandemic using CoV-GLUE.

**Figure 9**
**CoVex: CoronaVirus Explorer.**CoVex is a network medicine web platform that allows its users to interactively mine a large interactome that integrates information about virus–host protein interactions, known human protein–protein interactions as well as drug–protein interactions. CoVex can be used for identifying potential drug targets and drug repurposing candidates.

**Figure 10**
P-HIPSTer combines sequence and structural information to predict viral-host PPIs. P-HIPSTer evaluates the likelihood ratio (LR) for the potential interaction between a viral protein (in red) and a human protein (in blue) combining three evidences: (i) domain–domain LR that two structure domains interact based on known complex (green and purple domain–domain complex) comprised of their structural neighbours; (ii) peptide–domain LR that an unstructured peptide in one query binds to a structured domain in the second query based on known binding motifs/peptide–domain complex (green and purple peptide–domain complex) using both sequence and structural similarity; (iii) redundancy LR based on evidence that multiple structural neighbours (in orange, purple and green) of one query protein is known to interact with the remaining query protein. Each viral protein is functionally annotated based on sequence and structural similarity (either using homology models or known protein structures) and their corresponding set of predicted interacting human proteins.

See this image and copyright information in PMC

References

1. Aguilera LU, Rodríguez-González J. Modeling the effect of tat inhibitors on HIV latency. J Theor Biol 2019;473:20–7. - PubMed
1. Akgül A, Khoshnaw SHA, Mohammed WH. Mathematical model for the ebola virus disease. J Adv Phys 2018;7(2):190–8.
1. Bairoch A. The ENZYME database in 2000. Nucleic Acids Res 2000;28(1):304–5. - PMC - PubMed
1. Barry J. K.. Mathematical modelling of the HIV life cycle: identifying optimal treatment strategies. PhD thesis, University of Greifswald, 2018.
1. Boettiger C.. An introduction to docker for reproducible research. ACM SIGOPS Operating Systems Review, 2015;49(1):71–79.

Publication types

Actions
Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Computational strategies to combat COVID-19: useful tools to accelerate SARS-CoV-2 and coronavirus research

Affiliations

Computational strategies to combat COVID-19: useful tools to accelerate SARS-CoV-2 and coronavirus research

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Miscellaneous