Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2020 Mar 23;21(2):584-594.
doi: 10.1093/bib/bbz020.

New approaches for metagenome assembly with short reads

Affiliations
Review

New approaches for metagenome assembly with short reads

Martin Ayling et al. Brief Bioinform. .

Abstract

In recent years, the use of longer range read data combined with advances in assembly algorithms has stimulated big improvements in the contiguity and quality of genome assemblies. However, these advances have not directly transferred to metagenomic data sets, as assumptions made by the single genome assembly algorithms do not apply when assembling multiple genomes at varying levels of abundance. The development of dedicated assemblers for metagenomic data was a relatively late innovation and for many years, researchers had to make do using tools designed for single genomes. This has changed in the last few years and we have seen the emergence of a new type of tool built using different principles. In this review, we describe the challenges inherent in metagenomic assemblies and compare the different approaches taken by these novel assembly tools.

Keywords: Metagenomics; algorithms; assembly; sequencing.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Two different approaches to genome assembly: (a) in Overlap, Layout, Consensus assembly, (i) overlaps are found between reads and an overlap graph constructed (edges indicate overlapping reads). (ii) Reads are laid out into contigs based on the overlaps (dashed lines indicate overlapping portions). (iii) The most likely sequence is chosen to construct consensus sequence. (b) In dBg assembly, (i) reads are decomposed into kmers by sliding a window of size k across the reads. (ii) The kmers become vertices in the dBg, with edges connecting overlapping kmers. Polymorphisms (red) form branches in the graph. A count is kept of how many times a kmer is seen, shown here as numbers above kmers. (iii) Contigs are built by walking the graph from edge nodes. A variety of heuristics handle branches in the graphs—for example, low coverage paths, as shown here, may be ignored.

References

    1. Mitchell AL, Scheremetjew M, Denise H, et al. . EBI Metagenomics in 2017: enriching the analysis of microbial communities, from sequence reads to assemblies. Nucleic Acids Res 2017;46(D1):D726–35. - PMC - PubMed
    1. Ling LL, Schneider T, Peoples AJ, et al. . A new antibiotic kills pathogens without detectable resistance. Nature 2015;517:455–9. - PMC - PubMed
    1. The Human Microbiome Project Consortium Structure, function and diversity of the healthy human microbiome. Nature 2012;486(7402):207–14. - PMC - PubMed
    1. Afshinnekoo E, Meydan C, Chowdhury S, et al. . Geospatial resolution of human and bacterial diversity with city-scale metagenomics. Cell Syst 2015;29(1):72–87. - PMC - PubMed
    1. Baker KS, Leggett RM, Bexfield NH, et al. . Metagenomic study of the viruses of African straw-coloured fruit bats: detection of a chiropteran poxvirus and isolation of a novel adenovirus. Virology 2013;441(2):95–106. - PMC - PubMed

Publication types