Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Dec 28:12:giad048.
doi: 10.1093/gigascience/giad048. Epub 2023 Jul 3.

Training Infrastructure as a Service

Affiliations

Training Infrastructure as a Service

Helena Rasche et al. Gigascience. .

Abstract

Background: Hands-on training, whether in bioinformatics or other domains, often requires significant technical resources and knowledge to set up and run. Instructors must have access to powerful compute infrastructure that can support resource-intensive jobs running efficiently. Often this is achieved using a private server where there is no contention for the queue. However, this places a significant prerequisite knowledge or labor barrier for instructors, who must spend time coordinating deployment and management of compute resources. Furthermore, with the increase of virtual and hybrid teaching, where learners are located in separate physical locations, it is difficult to track student progress as efficiently as during in-person courses.

Findings: Originally developed by Galaxy Europe and the Gallantries project, together with the Galaxy community, we have created Training Infrastructure-as-a-Service (TIaaS), aimed at providing user-friendly training infrastructure to the global training community. TIaaS provides dedicated training resources for Galaxy-based courses and events. Event organizers register their course, after which trainees are transparently placed in a private queue on the compute infrastructure, which ensures jobs complete quickly, even when the main queue is experiencing high wait times. A built-in dashboard allows instructors to monitor student progress.

Conclusions: TIaaS provides a significant improvement for instructors and learners, as well as infrastructure administrators. The instructor dashboard makes remote events not only possible but also easy. Students experience continuity of learning, as all training happens on Galaxy, which they can continue to use after the event. In the past 60 months, 504 training events with over 24,000 learners have used this infrastructure for Galaxy training.

Keywords: Galaxy; remote training; teaching; training.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Figure 1:
Figure 1:
The top portion of the training dashboard page shows the status of the jobs in the past hours. A grayscale heatmap of the tools that were run indicates if everything is running smoothly or if there is anything the instructors should look into. As learners follow along and run different tools, these show up immediately in the dashboard, allowing instructors to identify if everyone has started or finished a specific step. The bottom portion shows the rest of the training dashboard, which lists jobs and workflows that were run, chronologically, color-coded first by user and second by the job status. Randomized colors and identifiers are used to protect user privacy.
Figure 2:
Figure 2:
Schematic of the idealized TIaaS queuing system. Jobs are processed by the same Galaxy server, but when those jobs come from users in the training group, they receive special handling. These jobs are allowed to run on the private training resources (purple). If the training resource is full, these jobs can spill over to the main queue if necessary.
Figure 3:
Figure 3:
Map of countries targeted by TIaaS events. This combines 2 datasets: the statistics provided by the Application Programming Interfaces (APIs) of the 4 discussed TIaaS servers and a set of corrections from course registration data for the Smörgåsbord event series. This correction is needed as the authors did not sufficiently fill out the TIaaS form when they requested resources for the Smörgåsbord event, choosing to specify only a single country, which would otherwise result in potential undercounting of countries actually targeted by TIaaS managed events.
Figure 4:
Figure 4:
Since its introduction, it has grown into a well-used service over the past 4 years. There have been 438 training events, primarily hosted by the Australian and European servers, which are both very involved in training. Event length distribution in days is extremely heavily skewed to very short events, with a long tail of semester-long courses using the platform. Event sizes show a similar distribution; most classes are small, while 7 extremely large courses (>500 participants) were filtered from this graph as outliers. These courses are more like Massive Open Online Courses (MOOCs) than traditional in person courses.
Figure 5:
Figure 5:
Pseudocode representing how TIaaS jobs are typically processed and allocated to a private queue.
Figure 6:
Figure 6:
YAML-formatted TPV configuration that schedules jobs coming from users with a training role to any machines labeled as training nodes.

References

    1. Attwood TK, Blackford S, Brazas MD, et al. A global perspective on evolving bioinformatics and data science training needs. Brief Bioinform. 2017;20(2):398–404. - PMC - PubMed
    1. Afgan E, Baker D, Batut B, et al. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Res. 2018;46(W1):W537–44. - PMC - PubMed
    1. Hiltemann S, Rasche H, Gladman S, et al. Galaxy Training: a powerful framework for teaching!. PLoS Comput Biol. 2023;19(1):e1010752. - PMC - PubMed
    1. Batut B, Hiltemann S, Bagnacani A et al. Community-driven data analysis training for biology. Cell Syst. 2018;6(6):752–8. - PMC - PubMed
    1. Wilkinson MD, Dumontier M, Aalbersberg IJJ, et al. The FAIR guiding principles for scientific data management and stewardship. Sci Data. 2016;3:160018. - PMC - PubMed

Publication types