Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Aug 4;12(8):1073.
doi: 10.3390/biom12081073.

Opportunities and Challenges of Data-Driven Virus Discovery

Affiliations

Opportunities and Challenges of Data-Driven Virus Discovery

Chris Lauber et al. Biomolecules. .

Abstract

Virus discovery has been fueled by new technologies ever since the first viruses were discovered at the end of the 19th century. Starting with mechanical devices that provided evidence for virus presence in sick hosts, virus discovery gradually transitioned into a sequence-based scientific discipline, which, nowadays, can characterize virus identity and explore viral diversity at an unprecedented resolution and depth. Sequencing technologies are now being used routinely and at ever-increasing scales, producing an avalanche of novel viral sequences found in a multitude of organisms and environments. In this perspective article, we argue that virus discovery has started to undergo another transformation prompted by the emergence of new approaches that are sequence data-centered and primarily computational, setting them apart from previous technology-driven innovations. The data-driven virus discovery approach is largely uncoupled from the collection and processing of biological samples, and exploits the availability of massive amounts of publicly and freely accessible data from sequencing archives. We discuss open challenges to be solved in order to unlock the full potential of data-driven virus discovery, and we highlight the benefits it can bring to classical (mostly molecular) virology and molecular biology in general.

Keywords: computational virology; data mining; sequencing archives; virosphere in health and disease; virus discovery.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Size increase in the Sequence Read Archive. Shown is the cumulative amount of the total (yellow) and open access (blue) petabytes deposited in the SRA for each month between April 2008 and May 2022. The points represent the actual amounts and the solid lines show the nonlinear least squares fits of logistic functions that captured the trend of the nonlinear increase considerably better than exponential functions (not shown). Parameters of the fitted curves are detailed in the inlets. The dashed vertical lines indicate the time points at which the amount of open access data doubled relative to the previous doubling time point.

Similar articles

Cited by

References

    1. Ivanovsky D. Über Die Mosaikkrankheit Der Tabakspflanze. Bull. Acad. Imper. Sci. St. Petersburg. 1892;35:67–70.
    1. Beijerinck M.W. Über Ein Contagium Vivum Fluidum Als Ursache Der Fleckenkrankheit Der Tabaksblätter. Verh Kon Akad Wetensch. 1898;65:3–21.
    1. Chamberland C. A Filter Permitting to Obtain Physiologically Pure Water. Compt. Rend. Acad. Sci. 1884;99:247–248.
    1. Löffler F., Frosch P. Summarischer Bericht Über Die Ergebnisse Der Untersuchungen Der Commission Zur Erforschung Der Maul-Und Klauenseuche. Cent. Bakt. Parasit. 1898;23:371–391.
    1. Stanley W.M., Loring H.S. The Isolation of Crystalline Tobacco Mosaic Virus Protein from Diseased Tomato Plants. Science. 1936;83:85. doi: 10.1126/science.83.2143.85.a. - DOI - PubMed

Publication types

LinkOut - more resources