The Sequence Read Archive: explosive growth of sequencing data
- PMID: 22009675
- PMCID: PMC3245110
- DOI: 10.1093/nar/gkr854
The Sequence Read Archive: explosive growth of sequencing data
Abstract
New generation sequencing platforms are producing data with significantly higher throughput and lower cost. A portion of this capacity is devoted to individual and community scientific projects. As these projects reach publication, raw sequencing datasets are submitted into the primary next-generation sequence data archive, the Sequence Read Archive (SRA). Archiving experimental data is the key to the progress of reproducible science. The SRA was established as a public repository for next-generation sequence data as a part of the International Nucleotide Sequence Database Collaboration (INSDC). INSDC is composed of the National Center for Biotechnology Information (NCBI), the European Bioinformatics Institute (EBI) and the DNA Data Bank of Japan (DDBJ). The SRA is accessible at www.ncbi.nlm.nih.gov/sra from NCBI, at www.ebi.ac.uk/ena from EBI and at trace.ddbj.nig.ac.jp from DDBJ. In this article, we present the content and structure of the SRA and report on updated metadata structures, submission file formats and supported sequencing platforms. We also briefly outline our various responses to the challenge of explosive data growth.
Similar articles
-
The sequence read archive.Nucleic Acids Res. 2011 Jan;39(Database issue):D19-21. doi: 10.1093/nar/gkq1019. Epub 2010 Nov 9. Nucleic Acids Res. 2011. PMID: 21062823 Free PMC article.
-
Archiving next generation sequencing data.Nucleic Acids Res. 2010 Jan;38(Database issue):D870-1. doi: 10.1093/nar/gkp1078. Epub 2009 Dec 3. Nucleic Acids Res. 2010. PMID: 19965774 Free PMC article.
-
DDBJ new system and service refactoring.Nucleic Acids Res. 2013 Jan;41(Database issue):D25-9. doi: 10.1093/nar/gks1152. Epub 2012 Nov 24. Nucleic Acids Res. 2013. PMID: 23180790 Free PMC article.
-
The evolution of dbSNP: 25 years of impact in genomic research.Nucleic Acids Res. 2025 Jan 6;53(D1):D925-D931. doi: 10.1093/nar/gkae977. Nucleic Acids Res. 2025. PMID: 39530225 Free PMC article. Review.
-
Efficient compression of SARS-CoV-2 genome data using Nucleotide Archival Format.Patterns (N Y). 2022 Sep 9;3(9):100562. doi: 10.1016/j.patter.2022.100562. Epub 2022 Jul 7. Patterns (N Y). 2022. PMID: 35818472 Free PMC article. Review.
Cited by
-
DGRPool, a web tool leveraging harmonized Drosophila Genetic Reference Panel phenotyping data for the study of complex traits.Elife. 2024 Oct 21;12:RP88981. doi: 10.7554/eLife.88981. Elife. 2024. PMID: 39431984 Free PMC article.
-
Intrahost SARS-CoV-2 k-mer identification method (iSKIM) for rapid detection of mutations of concern reveals emergence of global mutation patterns.bioRxiv [Preprint]. 2022 Aug 16:2022.08.16.504117. doi: 10.1101/2022.08.16.504117. bioRxiv. 2022. Update in: Viruses. 2022 Sep 27;14(10):2128. doi: 10.3390/v14102128. PMID: 36032969 Free PMC article. Updated. Preprint.
-
Goodbye genome paper, hello genome report: the increasing popularity of 'genome announcements' and their impact on science.Brief Funct Genomics. 2017 May 1;16(3):156-162. doi: 10.1093/bfgp/elw026. Brief Funct Genomics. 2017. PMID: 27339634 Free PMC article.
-
Secondary Analysis of Human Bulk RNA-Seq Dataset Suggests Potential Mechanisms for Letrozole Resistance in Estrogen-Positive (ER+) Breast Cancer.Curr Issues Mol Biol. 2024 Jul 6;46(7):7114-7133. doi: 10.3390/cimb46070424. Curr Issues Mol Biol. 2024. PMID: 39057065 Free PMC article.
-
A computational approach for identifying microRNA-target interactions using high-throughput CLIP and PAR-CLIP sequencing.BMC Genomics. 2013;14 Suppl 1(Suppl 1):S2. doi: 10.1186/1471-2164-14-S1-S2. Epub 2013 Jan 21. BMC Genomics. 2013. PMID: 23368412 Free PMC article.
References
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials