Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Dec 28:12:giad028.
doi: 10.1093/gigascience/giad028. Epub 2023 Apr 26.

An accessible infrastructure for artificial intelligence using a Docker-based JupyterLab in Galaxy

Affiliations

An accessible infrastructure for artificial intelligence using a Docker-based JupyterLab in Galaxy

Anup Kumar et al. Gigascience. .

Abstract

Background: Artificial intelligence (AI) programs that train on large datasets require powerful compute infrastructure consisting of several CPU cores and GPUs. JupyterLab provides an excellent framework for developing AI programs, but it needs to be hosted on such an infrastructure to enable faster training of AI programs using parallel computing.

Findings: An open-source, docker-based, and GPU-enabled JupyterLab infrastructure is developed that runs on the public compute infrastructure of Galaxy Europe consisting of thousands of CPU cores, many GPUs, and several petabytes of storage to rapidly prototype and develop end-to-end AI projects. Using a JupyterLab notebook, long-running AI model training programs can also be executed remotely to create trained models, represented in open neural network exchange (ONNX) format, and other output datasets in Galaxy. Other features include Git integration for version control, the option of creating and executing pipelines of notebooks, and multiple dashboards and packages for monitoring compute resources and visualization, respectively.

Conclusions: These features make JupyterLab in Galaxy Europe highly suitable for creating and managing AI projects. A recent scientific publication that predicts infected regions in COVID-19 computed tomography scan images is reproduced using various features of JupyterLab on Galaxy Europe. In addition, ColabFold, a faster implementation of AlphaFold2, is accessed in JupyterLab to predict the 3-dimensional structure of protein sequences. JupyterLab is accessible in 2 ways-one as an interactive Galaxy tool and the other by running the underlying Docker container. In both ways, long-running training can be executed on Galaxy's compute infrastructure. Scripts to create the Docker container are available under MIT license at https://github.com/usegalaxy-eu/gpu-jupyterlab-docker.

Keywords: CUDA; Elyra AI; GPU; Galaxy Europe; JupyterLab; ONNX; artificial intelligence; remote model training.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Figure 1
Figure 1
Architecture of Galaxy’s JupyterLab. (A) Packages and features wrapped inside a Docker container. (B) A base Docker container [41] from which the customized container [5] is derived. (C) Galaxy’s interactive tool downloads the customized container. The customized Docker container can also be hosted on a different compute infrastructure. (D) Galaxy’s JupyterLab.
Figure 2
Figure 2
Original CT scan images (A), corresponding ground-truth masks of original CT scan images (B), and the predicted masks (C). Masks are COVID-19 infected regions in the corresponding CT scan images. The ground-truth and predicted masks show high similarity [44].
Figure 3
Figure 3
Figures shows a 3D structure of 4-oxalocrotonate tautomerase enzyme (protein) [51] predicted by ColabFold.

Similar articles

References

    1. Pearson W, Crusoe M, et al. The FASTA package—protein and DNA sequence similarity searching and alignment programs. GitHub. 2016. https://github.com/wrpearson/fasta36. [Accessed June 30, 2022].
    1. Kumar I, Singh SP, Shivam. Machine learning in bioinformatics. Bioinformatics, Dev BS and Pathak RK , Academic Press; Dehradun 2022:443–56.. https://www.sciencedirect.com/science/article/pii/B9780323897754000201.
    1. Kluyver T, Ragan-Kelley B, Pérez F, Granger B, Bussonnier M, et al. Jupyter Notebooks—A Publishing Format for Reproducible Computational Workflows. IOS Press; Amsterdam. 2016:87.
    1. The Galaxy Community . The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2022 update. Nucleic Acids Res. 2022;50(W1):W345–51. - PMC - PubMed
    1. Kumar A. Container for machine learning and deep learning in Jupyter notebook. Docker. 2021. https://hub.docker.com/r/anupkumar/docker-ml-jupyterlab. [Accessed June 29, 2022]

Publication types