Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jul 19;20(4):1151-1159.
doi: 10.1093/bib/bbx105.

MG-RAST version 4-lessons learned from a decade of low-budget ultra-high-throughput metagenome analysis

MG-RAST version 4-lessons learned from a decade of low-budget ultra-high-throughput metagenome analysis

Folker Meyer et al. Brief Bioinform. .

Abstract

As technologies change, MG-RAST is adapting. Newly available software is being included to improve accuracy and performance. As a computational service constantly running large volume scientific workflows, MG-RAST is the right location to perform benchmarking and implement algorithmic or platform improvements, in many cases involving trade-offs between specificity, sensitivity and run-time cost. The work in [Glass EM, Dribinsky Y, Yilmaz P, et al. ISME J 2014;8:1-3] is an example; we use existing well-studied data sets as gold standards representing different environments and different technologies to evaluate any changes to the pipeline. Currently, we use well-understood data sets in MG-RAST as platform for benchmarking. The use of artificial data sets for pipeline performance optimization has not added value, as these data sets are not presenting the same challenges as real-world data sets. In addition, the MG-RAST team welcomes suggestions for improvements of the workflow. We are currently working on versions 4.02 and 4.1, both of which contain significant input from the community and our partners that will enable double barcoding, stronger inferences supported by longer-read technologies, and will increase throughput while maintaining sensitivity by using Diamond and SortMeRNA. On the technical platform side, the MG-RAST team intends to support the Common Workflow Language as a standard to specify bioinformatics workflows, both to facilitate development and efficient high-performance implementation of the community's data analysis tasks.

Keywords: cloud; distributed workflows; metagenome analysis.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
MG-RAST data and analysis results can be reused for other purposes. Here, we show a muscle [29] alignment of (the prodigal translations) of filtered sequences from the following unauthenticated API call: http://api.metagenomics.anl.gov//annotation/sequence/mgm4662210.3?evalue=10&type=function&source=Subsystems&filter=Inosine-5.
Figure 2.
Figure 2.
Backend of MG-RAST version 4 using several database systems to enable efficient querying via the API.
Figure 3.
Figure 3.
MG-RAST profile encoding abundance and matching parameter information as well as information on the observed entities.
Figure 4.
Figure 4.
Relative abundance of protein functional classes (‘Subsystems’) in Proteobacteria (‘RefSeq Phylum’) displayed as a waterfall diagram for data sets in study mgp128 as displayed by the version 4.0 MG-RAST graphical user interface.
Figure 5.
Figure 5.
(A) Heatmap and clustering of the occurrence of Corynebacteria in study mgp128 as displayed by the MG-RAST web frontend. (B) Data export options available for the data and visualization, including sequences and abundance in tabular and JSON format.
Figure 6.
Figure 6.
Public study (with permanent unique identifier mgp128) and private study set with temporary identifier. A study groups multiple data sets, provides a single identifier and allows sharing via simply providing an email address for the person the data are to be shared with.

References

    1. NHGRI. DNA sequencing costs. https://www.genome.gov/sequencingcosts/.[TQ1]
    1. Afgan E, Baker D, van den Beek M, et al.The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic Acids Res 2016;44:W3–10. - PMC - PubMed
    1. Doring A, Weese D, Rausch T, et al.SeqAn an efficient, generic C ++ library for sequence analysis. BMC Bioinformatics 2008;9:11.. - PMC - PubMed
    1. Xia F, Dou Y, Xu J. Families of FPGA-based accelerators for BLAST algorithm with multi-seeds detection and parallel extension. In: Elloumi M, Küng J, Linial M, et al. (eds), Bioinformatics Research and Development: Second International Conference, BIRD 2008 Vienna, Austria, July 7-9, 2008 Proceedings Berlin, Heidelberg: Springer Berlin Heidelberg, 2008, 43–57.
    1. Buchfink B, Xie C, Huson DH.. Fast and sensitive protein alignment using DIAMOND. Nat Methods 2015;12:59–60. - PubMed

Publication types

MeSH terms