Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Feb 1;37(2):593-598.
doi: 10.1093/molbev/msz237.

Computational Framework for High-Quality Production and Large-Scale Evolutionary Analysis of Metagenome Assembled Genomes

Affiliations

Computational Framework for High-Quality Production and Large-Scale Evolutionary Analysis of Metagenome Assembled Genomes

Boštjan Murovec et al. Mol Biol Evol. .

Abstract

Microbial species play important roles in different environments and the production of high-quality genomes from metagenome data sets represents a major obstacle to understanding their ecological and evolutionary dynamics. Metagenome-Assembled Genomes Orchestra (MAGO) is a computational framework that integrates and simplifies metagenome assembly, binning, bin improvement, bin quality (completeness and contamination), bin annotation, and evolutionary placement of bins via detailed maximum-likelihood phylogeny based on multiple marker genes using different amino acid substitution models, next to average nucleotide identity analysis of genomes for delineation of species boundaries and operational taxonomic units. MAGO offers streamlined execution of the entire metagenomics pipeline, error checking, computational resource distribution and compatibility of data formats, governed by user-tailored pipeline processing. MAGO is an open-source-software package released in three different ways, as a singularity image and a Docker container for HPC purposes as well as for running MAGO on a commodity hardware, and a virtual machine for gaining a full access to MAGO underlying structure and source code. MAGO is open to suggestions for extensions and is amenable for use in both research and teaching of genomics and molecular evolution of genomes assembled from small single-cell projects or large-scale and complex environmental metagenomes.

Keywords: FastANI; evolutionary analyses; genome assembly and binning; metagenomics; microbial draft genomes; species boundaries.

PubMed Disclaimer

Figures

<sc>Fig</sc>. 1.
Fig. 1.
A schematic representation of steps integrated within MAGO starting from the input of raw sequencing data to MAGs, bin quality checking and the production of a collection of high-quality MAGs. These are further utilized in analysis of evolutionary relationships to produce maximum-likelihood (ML) phylogenomic placement, MAGs annotation, and core/pan genome calculations next to determination of species boundaries and operational taxonomic units at genomic level. The outputs are easily integrated into recently developed tools (e.g., MEGA-X, Kumar et al. 2018; GTDB-Tk, Parks et al. 2018; MAGpy, Stewart et al. 2019).
<sc>Fig</sc>. 2.
Fig. 2.
Overview of the basic quality metrics of MAGs reconstructed from the moose rumen microbiome collection (samples S1–6) (supplementary table S3, Supplementary Material online; Svartström et al. 2017): (A) completeness (>50%); (B) contamination (<10%).
<sc>Fig</sc>. 3.
Fig. 3.
Genetic discontinuity observed in the wild moose rumen MAGs shown for the first 5,000 pairwise genome comparisons (supplementary table S3, Supplementary Material online). Values of FastANI estimates in the ANI range of 75–100% are shown. The 95% and 83% ANI thresholds of FastANI estimates serve to delineate comparisons belonging to the same species (>95% intraspecies ANI) or different species (<83% interspecies ANI).

References

    1. Alneberg J, Bjarnason BS, de Bruijn I, Schirmer M, Quick J, Ijaz UZ, Lahti L, Loman NJ, Andersson AF, Quince C.. 2014. Binning metagenomics contigs by coverage and composition. Nat Methods. 11(11):1144–1146. - PubMed
    1. Andrews A. 2010. FastQC: a quality control tool for high throughput sequence data. https://www.bioinformatics.babraham.ac.uk/projects/fastqc/; last accessed September 04, 2019.
    1. Bowers RM, Kyrpides NC, Stepanauskas R, Harmon-Smith M, Doud D, Reddy TBK, Schulz F, Jarett J, Rivers AR, Eloe-Fadrosh EA, et al. 2017. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat Biotechnol. 35(8):725–731. - PMC - PubMed
    1. Chen S, Zhou Y, Chen Y, Gu J.. 2018. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34(17):i884–i890. - PMC - PubMed
    1. Darling ACE, Mau BT, Perna NT.. 2010. Progressive mauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 5(6):e11147.. - PMC - PubMed

Publication types