Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Sep 6;16(9):1425.
doi: 10.3390/v16091425.

Bioinformatics Goes Viral: I. Databases, Phylogenetics and Phylodynamics Tools for Boosting Virus Research

Affiliations
Review

Bioinformatics Goes Viral: I. Databases, Phylogenetics and Phylodynamics Tools for Boosting Virus Research

Federico Vello et al. Viruses. .

Abstract

Computer-aided analysis of proteins or nucleic acids seems like a matter of course nowadays; however, the history of Bioinformatics and Computational Biology is quite recent. The advent of high-throughput sequencing has led to the production of "big data", which has also affected the field of virology. The collaboration between the communities of bioinformaticians and virologists already started a few decades ago and it was strongly enhanced by the recent SARS-CoV-2 pandemics. In this article, which is the first in a series on how bioinformatics can enhance virus research, we show that highly useful information is retrievable from selected general and dedicated databases. Indeed, an enormous amount of information-both in terms of nucleotide/protein sequences and their annotation-is deposited in the general databases of international organisations participating in the International Nucleotide Sequence Database Collaboration (INSDC). However, more and more virus-specific databases have been established and are progressively enriched with the contents and features reported in this article. Since viruses are intracellular obligate parasites, a special focus is given to host-pathogen protein-protein interaction databases. Finally, we illustrate several phylogenetic and phylodynamic tools, combining information on algorithms and features with practical information on how to use them and case studies that validate their usefulness. Databases and tools for functional inference will be covered in the next article of this series: Bioinformatics goes viral: II. Sequence-based and structure-based functional analyses for boosting virus research.

Keywords: bio-databases; bioinformatics; computational biology; data mining; phylodynamics; phylogenetics; sequence alignment; virus evolution; virus–host interaction.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

Figure 3
Figure 3
Graphical interface of the IQ-TREE web server [145]. Aligned sequence files (PHLIP, FASTA, NEXUS, CLUSTAL, or MSF format) can be uploaded using the Browse command. The default settings allow the programme to automatically detect the input sequence type and determine the best-fit substitution model, though users can also manually select from different models. Manual selection unlocks the ability to choose all provided common rate heterogeneity models across sites. The authors suggest using the FreeRate model [148]. If the input sequences do not contain constant sites, the ascertainment bias correction model can be included [149]. Branch support analyses enable users to include bootstrap analyses; the default setting uses the ultrafast bootstrap approximation (UFBoot) [142]. All default values are set to the maximum allowed and can be left as preset. The default single branch test is the SH-aLRT branch test [150], but users can also include the approximate Bayes test [151]. The default search parameters generally perform well but may not be suitable for all datasets. If tree search issues persist, the authors recommend repeating the analysis with at least 10 runs, adjusting the perturbation strength and stopping rule settings, especially for datasets with many short sequences. It is possible to find more information on the settings, including references, by pressing the “?” button next to the respective parameter. Data from Trifinopoulus et al. [144].
Figure 1
Figure 1
Experimentally validated PPIs in HVIDB taken from the shown public databases, and dataset overlapping. This image was taken as a snapshot from the HVIDB website [109], with permission from Yang et al. [114].
Figure 2
Figure 2
Graphical interfaces of (A) BEAST v1.10.4 and (B) BEAUti, both of which are included in the available package. The first step is to import the sequence alignment (FASTA or NEXUS format) using BEAUti. The programme then allows users to generate a sorted BEAST XML file. The XML file parameters can be set directly from the BEAUti interface, enabling the user to configure various settings, including substitution matrix model, clock model, tree model and prior, ancestral state reconstruction, and MCMC parameters. Once the XML file is created, it can be uploaded to BEAST, which can be run in default mode or with modified parameters. Images are taken from the examples package of the program, which can help practising with the software. Data from Suchard et al. [128].
Figure 4
Figure 4
Launch bar of the graphical (GUI) version of MEGA 11 [156].

Similar articles

References

    1. Hagen J.B. The origins of bioinformatics. Nat. Rev. Genet. 2000;1:231–236. doi: 10.1038/35042090. - DOI - PubMed
    1. Mullis K.B., Faloona F.A. Specific Synthesis of DNA in Vitro via a Polymerase-Catalyzed Chain Reaction. Methods Enzymol. 1987;155:335–350. doi: 10.1016/0076-6879(87)55023-6. - DOI - PubMed
    1. Gauthier J., Vincent A.T., Charette S.J., Derome N. A brief history of bioinformatics. Brief. Bioinform. 2019;20:1981–1996. doi: 10.1093/bib/bby063. - DOI - PubMed
    1. Satam H., Joshi K., Mangrolia U., Waghoo S., Zaidi G., Rawool S., Thakare R.P., Banday S., Mishra A.K., Das G., et al. Next-Generation Sequencing Technology: Current Trends and Advancements. Biology. 2023;12:997. doi: 10.3390/biology12070997. - DOI - PMC - PubMed
    1. Marz M., Beerenwinkel N., Drosten C., Fricke M., Frishman D., Hofacker I.L., Hoffmann D., Middendorf M., Rattei T., Stadler P.F., et al. Challenges in RNA virus bioinformatics. Bioinformatics. 2014;30:1793–1799. doi: 10.1093/bioinformatics/btu105. - DOI - PMC - PubMed

LinkOut - more resources