GSA: Genome Sequence Archive<sup/>

Yanqing Wang¹, Fuhai Song², Junwei Zhu¹, Sisi Zhang¹, Yadong Yang², Tingting Chen¹, Bixia Tang³, Lili Dong¹, Nan Ding⁴, Qian Zhang⁴, Zhouxian Bai², Xunong Dong², Huanxin Chen¹, Mingyuan Sun¹, Shuang Zhai¹, Yubin Sun¹, Lei Yu¹, Li Lan¹, Jingfa Xiao⁵, Xiangdong Fang⁶, Hongxing Lei⁷, Zhang Zhang⁸, Wenming Zhao⁹

Affiliations

¹ BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.
² CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
³ BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
⁴ CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.
⁵ BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China; Collaborative Innovation Center of Genetics and Development, Fudan University, Shanghai 200438, China.
⁶ CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China; Collaborative Innovation Center of Genetics and Development, Fudan University, Shanghai 200438, China. Electronic address: fangxd@big.ac.cn.
⁷ CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China; Center of Alzheimer's Disease, Beijing Institute for Brain Disorders, Beijing 100053, China. Electronic address: leihx@big.ac.cn.
⁸ BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China; Collaborative Innovation Center of Genetics and Development, Fudan University, Shanghai 200438, China. Electronic address: zhangzhang@big.ac.cn.
⁹ BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China; Collaborative Innovation Center of Genetics and Development, Fudan University, Shanghai 200438, China. Electronic address: zhaowm@big.ac.cn.

PMID: 28387199
PMCID: PMC5339404
DOI: 10.1016/j.gpb.2017.01.001

GSA: Genome Sequence Archive<sup/>

Yanqing Wang et al. Genomics Proteomics Bioinformatics. 2017 Feb.

. 2017 Feb;15(1):14-18.

doi: 10.1016/j.gpb.2017.01.001. Epub 2017 Feb 2.

Authors

Affiliations

¹ BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.
² CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
³ BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
⁴ CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.
⁵ BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China; Collaborative Innovation Center of Genetics and Development, Fudan University, Shanghai 200438, China.
⁶ CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China; Collaborative Innovation Center of Genetics and Development, Fudan University, Shanghai 200438, China. Electronic address: fangxd@big.ac.cn.
⁷ CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China; Center of Alzheimer's Disease, Beijing Institute for Brain Disorders, Beijing 100053, China. Electronic address: leihx@big.ac.cn.
⁸ BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China; Collaborative Innovation Center of Genetics and Development, Fudan University, Shanghai 200438, China. Electronic address: zhangzhang@big.ac.cn.
⁹ BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China; Collaborative Innovation Center of Genetics and Development, Fudan University, Shanghai 200438, China. Electronic address: zhaowm@big.ac.cn.

PMID: 28387199
PMCID: PMC5339404
DOI: 10.1016/j.gpb.2017.01.001

Abstract

With the rapid development of sequencing technologies towards higher throughput and lower cost, sequence data are generated at an unprecedentedly explosive rate. To provide an efficient and easy-to-use platform for managing huge sequence data, here we present Genome Sequence Archive (GSA; http://bigd.big.ac.cn/gsa or http://gsa.big.ac.cn), a data repository for archiving raw sequence data. In compliance with data standards and structures of the International Nucleotide Sequence Database Collaboration (INSDC), GSA adopts four data objects (BioProject, BioSample, Experiment, and Run) for data organization, accepts raw sequence reads produced by a variety of sequencing platforms, stores both sequence reads and metadata submitted from all over the world, and makes all these data publicly available to worldwide scientific communities. In the era of big data, GSA is not only an important complement to existing INSDC members by alleviating the increasing burdens of handling sequence data deluge, but also takes the significant responsibility for global big data archive and provides free unrestricted access to all publicly available data in support of research activities throughout the world.

Keywords: Big data; GSA; Genome Sequence Archive; INSDC; Raw sequence data.

PubMed Disclaimer

Figures

**Figure 1**
Data model in GSAPrefixes of accession numbers for data objects, including BioProject, BioSample, Experiment, and Run, are indicated in red. Data objects Experiment and Run constitute China Read Archive.

**Figure 2**
Data statistics of GSAA. Numbers of BioProjects and BioSamples in GSA. B. Numbers of Experiments and Runs, as well as file size in GSA. All statistics are based on data submissions ranging from December 2015 to December 2016.

**Figure 3**
Graphic illustration of data submissions to GSATwo representative studies are provided here as examples to depict the data objects involved in data submission.

See this image and copyright information in PMC

References

1. Collins F.S., Varmus H. A new initiative on precision medicine. N Engl J Med. 2015;372:793–795. - PMC - PubMed
1. Taylor P.N., Porcu E., Chew S., Campbell P.J., Traglia M., Brown S.J. Whole-genome sequence-based analysis of thyroid function. Nat Commun. 2015;6:5681. - PMC - PubMed
1. Gudbjartsson D.F., Helgason H., Gudjonsson S.A., Zink F., Oddson A., Gylfason A. Large-scale whole-genome sequencing of the Icelandic population. Nat Genet. 2015;47:435–444. - PubMed
1. Bai B., Zhao W.M., Tang B.X., Wang Y.Q., Wang L., Zhang Z. DoGSD: the dog and wolf genome SNP database. Nucleic Acids Res. 2015;43:D777–D783. - PMC - PubMed
1. Xue Y., Lameijer E.W., Ye K., Zhang K., Chang S., Wang X. Precision medicine: what challenges are we facing? Genomics Proteomics Bioinformatics. 2016;14:253–261. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

GSA: Genome Sequence Archive<sup/>

Affiliations

GSA: Genome Sequence Archive<sup/>

Authors

Affiliations

Abstract

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources