Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Aug;19(4):578-583.
doi: 10.1016/j.gpb.2021.08.001. Epub 2021 Aug 13.

The Genome Sequence Archive Family: Toward Explosive Data Growth and Diverse Data Types

Affiliations

The Genome Sequence Archive Family: Toward Explosive Data Growth and Diverse Data Types

Tingting Chen et al. Genomics Proteomics Bioinformatics. 2021 Aug.

Abstract

The Genome Sequence Archive (GSA) is a data repository for archiving raw sequence data, which provides data storage and sharing services for worldwide scientific communities. Considering explosive data growth with diverse data types, here we present the GSA family by expanding into a set of resources for raw data archive with different purposes, namely, GSA (https://ngdc.cncb.ac.cn/gsa/), GSA for Human (GSA-Human, https://ngdc.cncb.ac.cn/gsa-human/), and Open Archive for Miscellaneous Data (OMIX, https://ngdc.cncb.ac.cn/omix/). Compared with the 2017 version, GSA has been significantly updated in data model, online functionalities, and web interfaces. GSA-Human, as a new partner of GSA, is a data repository specialized in human genetics-related data with controlled access and security. OMIX, as a critical complement to the two resources mentioned above, is an open archive for miscellaneous data. Together, all these resources form a family of resources dedicated to archiving explosive data with diverse types, accepting data submissions from all over the world, and providing free open access to all publicly available data in support of worldwide research activities.

Keywords: GSA; GSA-Human; Genome Sequence Archive; OMIX.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Data model of the GSA family. BioProject and BioSample are two independent meta-information databases, acting as an organizational framework to provide centralized access to descriptive metadata about research projects and samples, respectively. GSA-Human is for archiving human genetic data and OMIX is for various types of data (that are unsuitable for GSA/GSA-Human).
Figure 2
Figure 2
Data statistics of the GSA family. A. Number of runs accumulated from 2016 to 2021, with five major species indicated. B. Increase in the volume of submitted data over time. Time needed to accumulate each PB of data is indicated. All statistics were derived from GSA and GSA-Human as of 30 June 2021. PB, petabyte; d, days.

References

    1. Wang Y., Song F., Zhu J., Zhang S., Yang Y., Chen T., et al. GSA: Genome Sequence Archive. Genomics Proteomics Bioinformatics. 2017;15:14–18. - PMC - PubMed
    1. Song S., Zhang Z. Database Resources in BIG Data Center: submission, archiving, and integration of big data in plant science. Mol Plant. 2019;12:279–281. - PubMed
    1. National Genomics Data Center Members and Partners Database resources of the National Genomics Data Center in 2020. Nucleic Acids Res. 2020;48:D24–D33. - PMC - PubMed
    1. CNCB-NGDC Members and Partners Database resources of the National Genomics Data Center, China National Center for Bioinformation in 2021. Nucleic Acids Res. 2021;49:D18–D28. - PMC - PubMed
    1. Lewin H.A., Robinson G.E., Kress W.J., Baker W.J., Coddington J., Crandall K.A., et al. Earth BioGenome Project: sequencing life for the future of life. Proc Natl Acad Sci U S A. 2018;115:4325–4333. - PMC - PubMed

Publication types

Substances