Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Mar;18(1):e8.
doi: 10.5808/GI.2020.18.1.e8. Epub 2020 Mar 31.

Bioinformatics services for analyzing massive genomic datasets

Affiliations

Bioinformatics services for analyzing massive genomic datasets

Gunhwan Ko et al. Genomics Inform. 2020 Mar.

Abstract

The explosive growth of next-generation sequencing data has resulted in ultra-large-scale datasets and ensuing computational problems. In Korea, the amount of genomic data has been increasing rapidly in the recent years. Leveraging these big data requires researchers to use large-scale computational resources and analysis pipelines. A promising solution for addressing this computational challenge is cloud computing, where CPUs, memory, storage, and programs are accessible in the form of virtual machines. Here, we present a cloud computing-based system, Bio-Express, that provides user-friendly, cost-effective analysis of massive genomic datasets. Bio-Express is loaded with predefined multi-omics data analysis pipelines, which are divided into genome, transcriptome, epigenome, and metagenome pipelines. Users can employ predefined pipelines or create a new pipeline for analyzing their own omics data. We also developed several web-based services for facilitating downstream analysis of genome data. Bio-Express web service is freely available at https://www.bioexpress.re.kr/.

Keywords: analysis pipeline; cloud computing; genomic data; web server; workflow system.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest

No potential conflict of interest relevant to this article was reported.

Figures

Fig. 1.
Fig. 1.
The interface of the Bio-Express workspace. The Bio-Express workflow editor has eight panels: the user’s projects (A), the file explorer (B), the canvas (C), the analysis programs of the current pipeline (D), the program parameter settings (E), the pipeline list (F), the program list (G), and the job execution history (H).
Fig. 2.
Fig. 2.
Screenshot of the RNA-sequencing (RNA-Seq) schematic diagram and its pipeline. The RNA-Seq pipeline was implemented on the canvas.
Fig. 3.
Fig. 3.
Workflow for the histone modification analysis pipeline. ChIP-Seq, chromatin immunoprecipitation sequencing.
Fig. 4.
Fig. 4.
Simplified workflow diagram of the metagenomics pipelines.
Fig. 5.
Fig. 5.
Screenshot of Bio-Express results. Users can view files in various formats, including text, HTML, and PNG on the web.

References

    1. Bansal V, Boucher C. Sequencing technologies and analyses: where have we been and where are we going? iScience. 2019;18:37–41. - PMC - PubMed
    1. Kodama Y, Shumway M, Leinonen R, International Nucleotide Sequence Database Collaboration The Sequence Read Archive: explosive growth of sequencing data. Nucleic Acids Res. 2012;40:D54–D56. - PMC - PubMed
    1. O'Driscoll A, Daugelaite J, Sleator RD. 'Big data', Hadoop and cloud computing in genomics. J Biomed Inform. 2013;46:774–781. - PubMed
    1. Langmead B, Nellore A. Cloud computing for genomic data analysis and collaboration. Nat Rev Genet. 2018;19:208–219. - PMC - PubMed
    1. Zhou S, Liao R, Guan J. When cloud computing meets bioinformatics: a review. J Bioinform Comput Biol. 2013;11:1330002. - PubMed

LinkOut - more resources