Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Sep;16(5):745-58.
doi: 10.1093/bib/bbv001. Epub 2015 Feb 11.

Environmental genes and genomes: understanding the differences and challenges in the approaches and software for their analyses

Environmental genes and genomes: understanding the differences and challenges in the approaches and software for their analyses

Marie Lisandra Zepeda Mendoza et al. Brief Bioinform. 2015 Sep.

Abstract

DNA-based taxonomic and functional profiling is widely used for the characterization of organismal communities across a rapidly increasing array of research areas that include the role of microbiomes in health and disease, biomonitoring, and estimation of both microbial and metazoan species richness. Two principal approaches are currently used to assign taxonomy to DNA sequences: DNA metabarcoding and metagenomics. When initially developed, each of these approaches mandated their own particular methods for data analysis; however, with the development of high-throughput sequencing (HTS) techniques they have begun to share many aspects in data set generation and processing. In this review we aim to define the current characteristics, goals and boundaries of each field, and describe the different software used for their analysis. We argue that an appreciation of the potential and limitations of each method can help underscore the improvements required by each field so as to better exploit the richness of current HTS-based data sets.

Keywords: DNA metabarcoding; environment; genome; metagenomics; software development.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Environmental sample analysis framework. (A) A sample can come from any environment that contains DNA; e.g. one of the most studied environments to date is the human gut microbiome. (B) DNA is extracted from the sample and sequenced according to the intended analyses. Shotgun sequencing produces genomic reads from the species present in the sample, while targeted sequencing produces amplicons with the aim of identifying a specific group of organisms. (C) Depending on the initial aim, whether functional and taxonomic characterization or only taxonomic characterization, the appropriate data set needs to be generated to be analyzed with the appropriate software.
Figure 2
Figure 2
Considerations and challenges for metagenomics and DNA metabarcoding. Both fields face a variety of challenges that are ideal candidates for future software development. While some of such problems are specific to one of the fields (right and left boxes), others are common to both (middle boxes).
Figure 3
Figure 3
Metabarcoding approaches. (A) Although PCR-free data sets are typically large, usually only a small percentage of the sequence reads map to a reference database. In such database, each entry has an assigned taxonomy so that phylogenetic placing approaches can be used for the taxonomic assignation. (B) PCR-based data sets consist of amplicon sequences that can be analyzed with the use of a reference database or without the need of it. If no database is used, the sequences are compared among themselves and are clustered by a similarity threshold; a representative sequence can be drawn from each cluster to then be compared with a reference database. On the other hand, if a database is used, the sequences are compared against the database and are assigned the taxonomy of the sequence they match under a given similarity threshold. A colour version of this figure is available online at BIB online: http://bib.oxfordjournals.org.
Figure 4
Figure 4
Metagenomic approaches. (A) Metagenomic reference-based approaches start by mapping the reads to a genome database and then apply various algorithms to assign taxonomy, such as phylogenetic placement, or the use of unique mapping reads to the genome of a species in the database. (B) Alternatively, the reads can be de novo assembled and the scaffolds, or the open reading frames predicted on the scaffolds, can be searched against the database, thus reducing the search time. (C) Metagenomic reference-free methods usually start by de novo assembling the reads, then the number of reads mapping back to the assembled sequences (the scaffolds or the open reading frames predicted from the scaffolds) can be used to create a count matrix that can be further clustered, with each cluster representing a metagenomic species. A colour version of this figure is available online at BIB online: http://bib.oxfordjournals.org.
Figure 5
Figure 5
Method classification placement map. As observed in the placement of the methods, there is lack of software in some areas while there is wealth in others, especially at the borderlines where at first they might seem difficult to classify. (A) Metagenomic reference based. (B) Metagenomic reference free. (C) DNA metabarcoding reference based. (D) DNA metabarcoding reference free.

References

    1. Shtarkman YM, Koçer ZA, Edgar R, et al. Subglacial Lake Vostok (Antarctica) accretion ice contains a diverse set of sequences from aquatic, marine and sediment-inhabiting bacteria and eukarya. PLoS One 2013;8(7):e67221. - PMC - PubMed
    1. Yau S, Lauro FM, Williams TJ, et al. Metagenomic insights into strategies of carbon conservation and unusual sulfur biogeochemistry in a hypersaline Antarctic lake. ISME J 2013;7(10):1944–61. - PMC - PubMed
    1. Bolduc B, Shaughnessy DP, Wolf YI, et al. Identification of novel positive-strand RNA viruses by metagenomic analysis of archaea-dominated Yellowstone hot springs. J Virol 2012;86(10):5562–73. - PMC - PubMed
    1. Schoenfeld T, Patterson M, Richardson PM, et al. Assembly of viral metagenomes from Yellowstone hot springs. Appl Environ Microbiol 2008;74(13):4164–74. - PMC - PubMed
    1. Eme L, Reigstad LJ, Spang A, et al. Metagenomics of Kamchatkan hot spring filaments reveal two new major (hyper)thermophilic lineages related to Thaumarchaeota. Res Microbiol 2013;164(5):425–38. - PubMed

Publication types