Opportunities and Challenges of Data-Driven Virus Discovery

Chris Lauber¹, Stefan Seitz^{2

3}

Affiliations

¹ Institute for Experimental Virology, TWINCORE Centre for Experimental and Clinical Infection Research, a Joint Venture between the Hannover Medical School (MHH) and the Helmholtz Centre for Infection Research (HZI), 30625 Hannover, Germany.
² Division of Virus-Associated Carcinogenesis (F170), German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany.
³ Department of Infectious Diseases, Molecular Virology, University of Heidelberg, 69120 Heidelberg, Germany.

PMID: 36008967
PMCID: PMC9406072
DOI: 10.3390/biom12081073

Opportunities and Challenges of Data-Driven Virus Discovery

Chris Lauber et al. Biomolecules. 2022.

. 2022 Aug 4;12(8):1073.

doi: 10.3390/biom12081073.

Authors

Chris Lauber¹, Stefan Seitz^{2

3}

Affiliations

¹ Institute for Experimental Virology, TWINCORE Centre for Experimental and Clinical Infection Research, a Joint Venture between the Hannover Medical School (MHH) and the Helmholtz Centre for Infection Research (HZI), 30625 Hannover, Germany.
² Division of Virus-Associated Carcinogenesis (F170), German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany.
³ Department of Infectious Diseases, Molecular Virology, University of Heidelberg, 69120 Heidelberg, Germany.

PMID: 36008967
PMCID: PMC9406072
DOI: 10.3390/biom12081073

Abstract

Virus discovery has been fueled by new technologies ever since the first viruses were discovered at the end of the 19th century. Starting with mechanical devices that provided evidence for virus presence in sick hosts, virus discovery gradually transitioned into a sequence-based scientific discipline, which, nowadays, can characterize virus identity and explore viral diversity at an unprecedented resolution and depth. Sequencing technologies are now being used routinely and at ever-increasing scales, producing an avalanche of novel viral sequences found in a multitude of organisms and environments. In this perspective article, we argue that virus discovery has started to undergo another transformation prompted by the emergence of new approaches that are sequence data-centered and primarily computational, setting them apart from previous technology-driven innovations. The data-driven virus discovery approach is largely uncoupled from the collection and processing of biological samples, and exploits the availability of massive amounts of publicly and freely accessible data from sequencing archives. We discuss open challenges to be solved in order to unlock the full potential of data-driven virus discovery, and we highlight the benefits it can bring to classical (mostly molecular) virology and molecular biology in general.

Keywords: computational virology; data mining; sequencing archives; virosphere in health and disease; virus discovery.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Figure 1**
Size increase in the Sequence Read Archive. Shown is the cumulative amount of the total (yellow) and open access (blue) petabytes deposited in the SRA for each month between April 2008 and May 2022. The points represent the actual amounts and the solid lines show the nonlinear least squares fits of logistic functions that captured the trend of the nonlinear increase considerably better than exponential functions (not shown). Parameters of the fitted curves are detailed in the inlets. The dashed vertical lines indicate the time points at which the amount of open access data doubled relative to the previous doubling time point.

See this image and copyright information in PMC

References

1. Ivanovsky D. Über Die Mosaikkrankheit Der Tabakspflanze. Bull. Acad. Imper. Sci. St. Petersburg. 1892;35:67–70.
1. Beijerinck M.W. Über Ein Contagium Vivum Fluidum Als Ursache Der Fleckenkrankheit Der Tabaksblätter. Verh Kon Akad Wetensch. 1898;65:3–21.
1. Chamberland C. A Filter Permitting to Obtain Physiologically Pure Water. Compt. Rend. Acad. Sci. 1884;99:247–248.
1. Löffler F., Frosch P. Summarischer Bericht Über Die Ergebnisse Der Untersuchungen Der Commission Zur Erforschung Der Maul-Und Klauenseuche. Cent. Bakt. Parasit. 1898;23:371–391.
1. Stanley W.M., Loring H.S. The Isolation of Crystalline Tobacco Mosaic Virus Protein from Diseased Tomato Plants. Science. 1936;83:85. doi: 10.1126/science.83.2143.85.a. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Opportunities and Challenges of Data-Driven Virus Discovery

Affiliations

Opportunities and Challenges of Data-Driven Virus Discovery

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources