Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Sep 27:23:3575-3583.
doi: 10.1016/j.csbj.2024.09.018. eCollection 2024 Dec.

Standardized and accessible multi-omics bioinformatics workflows through the NMDC EDGE resource

Affiliations

Standardized and accessible multi-omics bioinformatics workflows through the NMDC EDGE resource

Julia M Kelliher et al. Comput Struct Biotechnol J. .

Abstract

Accessible and easy-to-use standardized bioinformatics workflows are necessary to advance microbiome research from observational studies to large-scale, data-driven approaches. Standardized multi-omics data enables comparative studies, data reuse, and applications of machine learning to model biological processes. To advance broad accessibility of standardized multi-omics bioinformatics workflows, the National Microbiome Data Collaborative (NMDC) has developed the Empowering the Development of Genomics Expertise (NMDC EDGE) resource, a user-friendly, open-source web application (https://nmdc-edge.org). Here, we describe the design and main functionality of the NMDC EDGE resource for processing metagenome, metatranscriptome, natural organic matter, and metaproteome data. The architecture relies on three main layers (web application, orchestration, and execution) to ensure flexibility and expansion to future workflows. The orchestration and execution layers leverage best practices in software containers and accommodate high-performance computing and cloud computing services. Further, we have adopted a robust user research process to collect feedback for continuous improvement of the resource. NMDC EDGE provides an accessible interface for researchers to process multi-omics microbiome data using production-quality workflows to facilitate improved data standardization and interoperability.

Keywords: Bioinformatics workflows; Microbiome; Multi-omics; Open-source; Software; Standardization.

PubMed Disclaimer

Conflict of interest statement

The authors do not have any conflicts of interest to disclose.

Figures

Fig. 1
Fig. 1
The NMDC EDGE architecture has three layers: the web application (A, red), orchestration (B, purple), and execution (C, blue) layers. The web application layer and orchestration layer run in a virtual machine (VM) and the execution layer runs in a high-performance computing (HPC) environment. (A) Users engage with the web application to select input files and workflows. This information is used to populate a workflow template WDL file and an input json file. (B) The orchestration layer generates a Cromwell job for the execution layer. Cromwell jobs are executed in a shared Linux computing environment together with non-Cromwell jobs, where the resource manager (e.g., Slurm) handles the actual computing jobs and resources in the cluster. (C) Cromwell handles the packaging of WDL-defined workflows into Slurm jobs and also monitors the job execution and updates the MySQL database with the status. When jobs are completed, Slurm resumes control, cleans up temporary files, and writes standard outputs and error messages. Workflow output files are moved to destinations as defined by the WDL files on a filesystem shared between the web application and the execution layer. These then become accessible to the project owner and any users who have been granted access by the owner. The web application layer tracks the status of the workflow (e.g., queuing time, job status) through communications with the orchestration layer and updates the status for the user. In the event of a workflow failure, an error message is displayed to the user, giving information about the source of the failure.
Fig. 2
Fig. 2
The NMDC standardized bioinformatics workflows and their associated inputs and main outputs for (A) sequencing data and (B) mass spectrometry data. Additional output files and visualizations are shown in Table 1. The rectangles indicate workflows, and gray parallelograms indicate inputs and outputs. Direct infusion Fourier-transform ion cyclotron mass spectrometry (DI FT-ICR MS); Metagenome-assembled genomes (MAGs); Liquid chromatography-tandem mass spectrometry (LC-MS/MS).
Fig. 3
Fig. 3
Key features of the NMDC EDGE resource for running standardized NMDC bioinformatics workflows.

References

    1. Agustinho D.P., Fu Y., Menon V.K., Metcalf G.A., Treangen T.J., Sedlazeck F.J. Unveiling microbial diversity: harnessing long-read sequencing technology. Nat Methods. 2024:1–13. doi: 10.1038/s41592-024-02262-1. - DOI - PMC - PubMed
    1. Arkin A.P., Cottingham R.W., Henry C.S., Harris N.L., Stevens R.L., Maslov S., et al. KBase: The United States department of energy systems biology knowledgebase. Nat Biotechnol. 2018;36:566–569. doi: 10.1038/nbt.4163. - DOI - PMC - PubMed
    1. BBMap. SourceForge 2023. https://sourceforge.net/projects/bbmap/ (accessed June 19, 2024).
    1. Bland C., Ramsey T.L., Sabree F., Lowe M., Brown K., Kyrpides N.C., et al. CRISPR Recognition Tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats. BMC Bioinforma. 2007;8:209. doi: 10.1186/1471-2105-8-209. - DOI - PMC - PubMed
    1. Boerner T.J., Deems S., Furlani T.R., Knuth S.L., Towns J. Practice and Experience in Advanced Research Computing. Association for Computing Machinery; New York, NY, USA: 2023. ACCESS: Advancing Innovation: NSF’s Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support; pp. 173–176. - DOI