Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Feb;15(1):14-18.
doi: 10.1016/j.gpb.2017.01.001. Epub 2017 Feb 2.

GSA: Genome Sequence Archive<sup/>

Affiliations

GSA: Genome Sequence Archive<sup/>

Yanqing Wang et al. Genomics Proteomics Bioinformatics. 2017 Feb.

Abstract

With the rapid development of sequencing technologies towards higher throughput and lower cost, sequence data are generated at an unprecedentedly explosive rate. To provide an efficient and easy-to-use platform for managing huge sequence data, here we present Genome Sequence Archive (GSA; http://bigd.big.ac.cn/gsa or http://gsa.big.ac.cn), a data repository for archiving raw sequence data. In compliance with data standards and structures of the International Nucleotide Sequence Database Collaboration (INSDC), GSA adopts four data objects (BioProject, BioSample, Experiment, and Run) for data organization, accepts raw sequence reads produced by a variety of sequencing platforms, stores both sequence reads and metadata submitted from all over the world, and makes all these data publicly available to worldwide scientific communities. In the era of big data, GSA is not only an important complement to existing INSDC members by alleviating the increasing burdens of handling sequence data deluge, but also takes the significant responsibility for global big data archive and provides free unrestricted access to all publicly available data in support of research activities throughout the world.

Keywords: Big data; GSA; Genome Sequence Archive; INSDC; Raw sequence data.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Data model in GSAPrefixes of accession numbers for data objects, including BioProject, BioSample, Experiment, and Run, are indicated in red. Data objects Experiment and Run constitute China Read Archive.
Figure 2
Figure 2
Data statistics of GSAA. Numbers of BioProjects and BioSamples in GSA. B. Numbers of Experiments and Runs, as well as file size in GSA. All statistics are based on data submissions ranging from December 2015 to December 2016.
Figure 3
Figure 3
Graphic illustration of data submissions to GSATwo representative studies are provided here as examples to depict the data objects involved in data submission.

References

    1. Collins F.S., Varmus H. A new initiative on precision medicine. N Engl J Med. 2015;372:793–795. - PMC - PubMed
    1. Taylor P.N., Porcu E., Chew S., Campbell P.J., Traglia M., Brown S.J. Whole-genome sequence-based analysis of thyroid function. Nat Commun. 2015;6:5681. - PMC - PubMed
    1. Gudbjartsson D.F., Helgason H., Gudjonsson S.A., Zink F., Oddson A., Gylfason A. Large-scale whole-genome sequencing of the Icelandic population. Nat Genet. 2015;47:435–444. - PubMed
    1. Bai B., Zhao W.M., Tang B.X., Wang Y.Q., Wang L., Zhang Z. DoGSD: the dog and wolf genome SNP database. Nucleic Acids Res. 2015;43:D777–D783. - PMC - PubMed
    1. Xue Y., Lameijer E.W., Ye K., Zhang K., Chang S., Wang X. Precision medicine: what challenges are we facing? Genomics Proteomics Bioinformatics. 2016;14:253–261. - PMC - PubMed