Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Dec 28:12:giad078.
doi: 10.1093/gigascience/giad078. Epub 2023 Oct 18.

metaGOflow: a workflow for the analysis of marine Genomic Observatories shotgun metagenomics data

Affiliations

metaGOflow: a workflow for the analysis of marine Genomic Observatories shotgun metagenomics data

Haris Zafeiropoulos et al. Gigascience. .

Abstract

Background: Genomic Observatories (GOs) are sites of long-term scientific study that undertake regular assessments of the genomic biodiversity. The European Marine Omics Biodiversity Observation Network (EMO BON) is a network of GOs that conduct regular biological community samplings to generate environmental and metagenomic data of microbial communities from designated marine stations around Europe. The development of an effective workflow is essential for the analysis of the EMO BON metagenomic data in a timely and reproducible manner.

Findings: Based on the established MGnify resource, we developed metaGOflow. metaGOflow supports the fast inference of taxonomic profiles from GO-derived data based on ribosomal RNA genes and their functional annotation using the raw reads. Thanks to the Research Object Crate packaging, relevant metadata about the sample under study, and the details of the bioinformatics analysis it has been subjected to, are inherited to the data product while its modular implementation allows running the workflow partially. The analysis of 2 EMO BON samples and 1 Tara Oceans sample was performed as a use case.

Conclusions: metaGOflow is an efficient and robust workflow that scales to the needs of projects producing big metagenomic data such as EMO BON. It highlights how containerization technologies along with modern workflow languages and metadata package approaches can support the needs of researchers when dealing with ever-increasing volumes of biological data. Despite being initially oriented to address the needs of EMO BON, metaGOflow is a flexible and easy-to-use workflow that can be broadly used for one-sample-at-a-time analysis of shotgun metagenomics data.

Keywords: Common Workflow Language (CWL); MGnify; RO-Crate; containers; provenance; shotgun metagenomics.

PubMed Disclaimer

Conflict of interest statement

M.B., L.R., and R.D.F. are members of the MGnify group that is part of the ELIXIR infrastructure [95]. The authors declare that they have no other competing interests.

Figures

Figure 1:
Figure 1:
Schematic overview of metaGOflow, showing the main steps of the analysis along with their corresponding data products; the partial execution of the workflow is also shown by the potential exit points (left). Independent of the steps to be performed, once completed, an RO-Crate is built (right).
Figure 2:
Figure 2:
Visualization of metaGOflow’s main output. (A) Raw data are first filtered and only high-quality sequences are analyzed further in the next steps. An .html file with the report of the merged reads is produced. Here, an excerpt of this report is shown: reads’ statistics before and after filtering (left), ATGC chart with the quality of each base cycle after cycle for the merged reads (right). (B) The taxonomy inventory step returns molecular operational taxonomic units (mOTUs) and the taxonomic composition based on the LSU and the SSU genes. Here, the taxonomic composition is represented by a Krona interactive visualization. (C). The functional annotation step returns text files with the GO, KEGG, InterProScan, and Pfam terms retrieved. The retrieved GO terms are presented using Navigo [38], the Co-occurrence Association Score (CAS-1), and the Relevance Semantic Similarity (RSS-1). The Gene prediction step returns a .ffn and a .faa file while the assembly step a .fasta file, including the contigs retrieved. The main output of the provenance feature is the ro-crate-metadata.json file.
Figure 3:
Figure 3:
Part of the ro-crate-metadata.json file describing the metaGOflow output files.

References

    1. Louca S, Parfrey LW, Doebeli M. Decoupling function and taxonomy in the global ocean microbiome. Science. 2016;353(6305):1272–7.. 10.1126/science.aaf4507. - DOI - PubMed
    1. Doney SC, Ruckelshaus M, Emmett Duffy J, et al. Climate change impacts on marine ecosystems. Ann Rev Mar Sci. 2012;4:11–37.. 10.1146/annurev-marine-041911-111611. - DOI - PubMed
    1. Chen J, McIlroy SE, Archana A, et al. A pollution gradient contributes to the taxonomic, functional, and resistome diversity of microbial communities in marine sediments. Microbiome. 2019;7(1):1–12.. 10.1186/s40168-018-0604-3. - DOI - PMC - PubMed
    1. Caruso G, La Ferla R, Azzaro M, et al. Microbial assemblages for environmental quality assessment: knowledge, gaps and usefulness in the European Marine Strategy Framework Directive. Crit Rev Microbiol. 2016;42(6):883–904.. 10.3109/1040841X.2015.1087380. - DOI - PubMed
    1. Caruso G, Azzaro M, Caroppo C, et al. Microbial community and its potential as descriptor of environmental status. ICES J Mar Sci. 2016;73(9):2174–7.. 10.1093/icesjms/fsw101. - DOI

Publication types