Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2015 Jul 7;13(7):e1002195.
doi: 10.1371/journal.pbio.1002195. eCollection 2015 Jul.

Big Data: Astronomical or Genomical?

Affiliations
Comparative Study

Big Data: Astronomical or Genomical?

Zachary D Stephens et al. PLoS Biol. .

Abstract

Genomics is a Big Data science and is going to get much bigger, very soon, but it is not known whether the needs of genomics will exceed other Big Data domains. Projecting to the year 2025, we compared genomics with three other major generators of Big Data: astronomy, YouTube, and Twitter. Our estimates show that genomics is a "four-headed beast"--it is either on par with or the most demanding of the domains analyzed here in terms of data acquisition, storage, distribution, and analysis. We discuss aspects of new technologies that will need to be developed to rise up and meet the computational challenges that genomics poses for the near future. Now is the time for concerted, community-wide planning for the "genomical" challenges of the next decade.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Growth of DNA sequencing.
The plot shows the growth of DNA sequencing both in the total number of human genomes sequenced (left axis) as well as the worldwide annual sequencing capacity (right axis: Tera-basepairs (Tbp), Peta-basepairs (Pbp), Exa-basepairs (Ebp), Zetta-basepairs (Zbps)). The values through 2015 are based on the historical publication record, with selected milestones in sequencing (first Sanger through first PacBio human genome published) as well as three exemplar projects using large-scale sequencing: the 1000 Genomes Project, aggregating hundreds of human genomes by 2012 [3]; The Cancer Genome Atlas (TCGA), aggregating over several thousand tumor/normal genome pairs [4]; and the Exome Aggregation Consortium (ExAC), aggregating over 60,000 human exomes [5]. Many of the genomes sequenced to date have been whole exome rather than whole genome, but we expect the ratio to be increasingly favored towards whole genome in the future. The values beyond 2015 represent our projection under three possible growth curves as described in the main text.

References

    1. Mole, B. The gene sequencing future is here. 2014; http://www.sciencenews.org/article/gene-sequencing-future-here.
    1. Robinson G.E., et al., Creating a Buzz About Insect Genomes. Science, 2011. 18: 1386. - PubMed
    1. Abecasis G.R., et al., An integrated map of genetic variation from 1,092 human genomes. Nature, 2012. 491(7422): 56–65. 10.1038/nature11632 - DOI - PMC - PubMed
    1. Chin L., Andersen J.N., and Futreal P.A., Cancer genomics: from discovery science to personalized medicine. Nature medicine, 2011. 17(3): 297–303. 10.1038/nm.2323 - DOI - PubMed
    1. Exome Aggregation Consortium. Exome Aggregation Consortium ExAC Browser. 2015; http://exac.broadinstitute.org/.

Publication types