The Sequence Read Archive: explosive growth of sequencing data

Yuichi Kodama¹, Martin Shumway, Rasko Leinonen; International Nucleotide Sequence Database Collaboration

Affiliations

Affiliation

¹ Center for Information Biology and DNA Data Bank of Japan, National Institute of Genetics, Research Organization of Information and Systems, Yata, Mishima 411-8540, Japan. ykodama@genes.nig.ac.jp

PMID: 22009675
PMCID: PMC3245110
DOI: 10.1093/nar/gkr854

The Sequence Read Archive: explosive growth of sequencing data

Yuichi Kodama et al. Nucleic Acids Res. 2012 Jan.

. 2012 Jan;40(Database issue):D54-6.

doi: 10.1093/nar/gkr854. Epub 2011 Oct 18.

Authors

Yuichi Kodama¹, Martin Shumway, Rasko Leinonen; International Nucleotide Sequence Database Collaboration

Affiliation

¹ Center for Information Biology and DNA Data Bank of Japan, National Institute of Genetics, Research Organization of Information and Systems, Yata, Mishima 411-8540, Japan. ykodama@genes.nig.ac.jp

PMID: 22009675
PMCID: PMC3245110
DOI: 10.1093/nar/gkr854

Abstract

New generation sequencing platforms are producing data with significantly higher throughput and lower cost. A portion of this capacity is devoted to individual and community scientific projects. As these projects reach publication, raw sequencing datasets are submitted into the primary next-generation sequence data archive, the Sequence Read Archive (SRA). Archiving experimental data is the key to the progress of reproducible science. The SRA was established as a public repository for next-generation sequence data as a part of the International Nucleotide Sequence Database Collaboration (INSDC). INSDC is composed of the National Center for Biotechnology Information (NCBI), the European Bioinformatics Institute (EBI) and the DNA Data Bank of Japan (DDBJ). The SRA is accessible at www.ncbi.nlm.nih.gov/sra from NCBI, at www.ebi.ac.uk/ena from EBI and at trace.ddbj.nig.ac.jp from DDBJ. In this article, we present the content and structure of the SRA and report on updated metadata structures, submission file formats and supported sequencing platforms. We also briefly outline our various responses to the challenge of explosive data growth.

PubMed Disclaimer

References

1. Shumway M, Cochrane G, Sugawara H. Archiving next generation sequencing data. Nucleic Acids Res. 2010;38:D870–D871. - PMC - PubMed
1. Leinonen R, Sugawara H, Shumway M. The sequence read archive. Nucleic Acids Res. 2011;39:D19–D21. - PMC - PubMed
1. Karsch-Mizrachi I, Nakamura Y, Cochrane G. The International Nucleotide Sequence Database Collaboration. Nucleic Acids Res. 2012;40:D33–D37. - PMC - PubMed
1. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079. - PMC - PubMed
1. Barrett T, Troup DB, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, et al. NCBI GEO: archive for functional genomics data sets–10 years on. Nucleic Acids Res. 2011;39:D1005–D1010. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The Sequence Read Archive: explosive growth of sequencing data

Affiliation

The Sequence Read Archive: explosive growth of sequencing data

Authors

Affiliation

Abstract

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials