Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Oct 21;21(1):471.
doi: 10.1186/s12859-020-03815-9.

MetaLAFFA: a flexible, end-to-end, distributed computing-compatible metagenomic functional annotation pipeline

Affiliations

MetaLAFFA: a flexible, end-to-end, distributed computing-compatible metagenomic functional annotation pipeline

Alexander Eng et al. BMC Bioinformatics. .

Abstract

Background: Microbial communities have become an important subject of research across multiple disciplines in recent years. These communities are often examined via shotgun metagenomic sequencing, a technology which can offer unique insights into the genomic content of a microbial community. Functional annotation of shotgun metagenomic data has become an increasingly popular method for identifying the aggregate functional capacities encoded by the community's constituent microbes. Currently available metagenomic functional annotation pipelines, however, suffer from several shortcomings, including limited pipeline customization options, lack of standard raw sequence data pre-processing, and insufficient capabilities for integration with distributed computing systems.

Results: Here we introduce MetaLAFFA, a functional annotation pipeline designed to take unfiltered shotgun metagenomic data as input and generate functional profiles. MetaLAFFA is implemented as a Snakemake pipeline, which enables convenient integration with distributed computing clusters, allowing users to take full advantage of available computing resources. Default pipeline settings allow new users to run MetaLAFFA according to common practices while a Python module-based configuration system provides advanced users with a flexible interface for pipeline customization. MetaLAFFA also generates summary statistics for each step in the pipeline so that users can better understand pre-processing and annotation quality.

Conclusions: MetaLAFFA is a new end-to-end metagenomic functional annotation pipeline with distributed computing compatibility and flexible customization options. MetaLAFFA source code is available at https://github.com/borenstein-lab/MetaLAFFA and can be installed via Conda as described in the accompanying documentation.

Keywords: Distributed computing; Functional annotation; Metagenomics; Pipeline.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Flowchart of the default MetaLAFFA workflow. The default MetaLAFFA workflow consists of three phases, quality control (top), read mapping (middle), and functional annotation (bottom). This flowchart outlines the individual processing steps taken in each phase (colored rectangular boxes), the intermediate outputs of these steps (grey rounded boxes), supporting data files required for specific steps (yellow rounded boxes), user-provided input to MetaLAFFA (red rounded box), and the final outputs of the pipeline (purple rounded boxes). Third-party tools used in the default pipeline workflow are indicated in parentheses for their associated processing steps. File types of all inputs, outputs, and supporting data files are indicated by file suffix

References

    1. 1000 Genomes Project Consortium A global reference for human genetic variation. Nature. 2015;526(7571):68–74. doi: 10.1038/nature15393. - DOI - PMC - PubMed
    1. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for illumina sequence data. Bioinformatics. 2014;30(15):2114–2120. doi: 10.1093/bioinformatics/btu170. - DOI - PMC - PubMed
    1. Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using DIAMOND. Nat Methods. 2014;12(1):59–60. doi: 10.1038/nmeth.3176. - DOI - PubMed
    1. Carr R, Borenstein E. Comparative analysis of functional metagenomic annotation and the mappability of short reads. PLoS ONE. 2014;9(8):e105776. doi: 10.1371/journal.pone.0105776. - DOI - PMC - PubMed
    1. Fennel, T. et al. 2009. Picard. https://Broadinstitute.Github.Io/Picard.

LinkOut - more resources