Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jan 8;48(D1):D45-D50.
doi: 10.1093/nar/gkz982.

DDBJ Database updates and computational infrastructure enhancement

Affiliations

DDBJ Database updates and computational infrastructure enhancement

Osamu Ogasawara et al. Nucleic Acids Res. .

Abstract

The Bioinformation and DDBJ Center (https://www.ddbj.nig.ac.jp) in the National Institute of Genetics (NIG) maintains a primary nucleotide sequence database as a member of the International Nucleotide Sequence Database Collaboration (INSDC) in partnership with the US National Center for Biotechnology Information and the European Bioinformatics Institute. The NIG operates the NIG supercomputer as a computational basis for the construction of DDBJ databases and as a large-scale computational resource for Japanese biologists and medical researchers. In order to accommodate the rapidly growing amount of deoxyribonucleic acid (DNA) nucleotide sequence data, NIG replaced its supercomputer system, which is designed for big data analysis of genome data, in early 2019. The new system is equipped with 30 PB of DNA data archiving storage; large-scale parallel distributed file systems (13.8 PB in total) and 1.1 PFLOPS computation nodes and graphics processing units (GPUs). Moreover, as a starting point of developing multi-cloud infrastructure of bioinformatics, we have also installed an automatic file transfer system that allows users to prevent data lock-in and to achieve cost/performance balance by exploiting the most suitable environment from among the supercomputer and public clouds for different workloads.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
General architecture of the NIG supercomputer installed in 2019. Based on the previous system, the NIG Supercomputer 2019 mainly consists of a distributed memory HPC cluster, high-performance parallel distributed file systems for calculation, and large capacity archiving storage systems for the DNA database. Those systems are interconnected via a high-throughput low-latency network (InfiniBand) and various management networks (Ethernet).
Figure 2.
Figure 2.
Automatic file transfer system between the NIG supercomputer and a public cloud (Amazon Web Service). Dedicated data transfer server (Fusic data transfer) is installed in the NIG supercomputer that allows users to send data, up and down compute instances, running jobs and make configuration changes on the AWS cloud by using a series of command line tools installed on the NIG supercomputer. SINET5 network is subject to discount for egress network traffic charge of the public cloud.

References

    1. Kodama Y., Mashima J., Kosuge T., Ogasawara O.. DDBJ update: the Genomic expression archive (GEA) for functional genomics data. Nucleic Acids Res. 2019; 47:D69–D73. - PMC - PubMed
    1. Sayers E.W., Cavanaugh M., Clark K., Ostell J., Pruitt K.D., Karsch-Mizrachi I.. GenBank. Nucleic Acids Res. 2019; 47:D94–D99. - PMC - PubMed
    1. Harrison P.W., Alako B., Amid C., Cerdeño-Tárraga A., Cleland I., Holt S., Hussein A., Jayathilaka S., Kay S., Keane T. et al. .. The european nucleotide archive in 2018. Nucleic Acids Res. 2019; 47:D84–D88. - PMC - PubMed
    1. Karsch-Mizrachi I., Takagi T., Cochrane G. International Nucleotide Sequence Database Collaboration . The international nucleotide sequence database collaboration. Nucleic Acids Res. 2018; 46:D48–D51. - PMC - PubMed
    1. Kodama Y., Shumway M., Leinonen R.. The Sequence Read Archive: explosive growth of sequencing data. Nucleic Acids Res. 2012; 40:D54–D56. - PMC - PubMed

Publication types