Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Nov 5;116(45):22651-22656.
doi: 10.1073/pnas.1911714116. Epub 2019 Oct 21.

GenBank is a reliable resource for 21st century biodiversity research

Affiliations

GenBank is a reliable resource for 21st century biodiversity research

Matthieu Leray et al. Proc Natl Acad Sci U S A. .

Abstract

Traditional methods of characterizing biodiversity are increasingly being supplemented and replaced by approaches based on DNA sequencing alone. These approaches commonly involve extraction and high-throughput sequencing of bulk samples from biologically complex communities or samples of environmental DNA (eDNA). In such cases, vouchers for individual organisms are rarely obtained, often unidentifiable, or unavailable. Thus, identifying these sequences typically relies on comparisons with sequences from genetic databases, particularly GenBank. While concerns have been raised about biases and inaccuracies in laboratory and analytical methods, comparatively little attention has been paid to the taxonomic reliability of GenBank itself. Here we analyze the metazoan mitochondrial sequences of GenBank using a combination of distance-based clustering and phylogenetic analysis. Because of their comparatively rapid evolutionary rates and consequent high taxonomic resolution, mitochondrial sequences represent an invaluable resource for the detection of the many small and often undescribed organisms that represent the bulk of animal diversity. We show that metazoan identifications in GenBank are surprisingly accurate, even at low taxonomic levels (likely <1% error rate at the genus level). This stands in contrast to previously voiced concerns based on limited analyses of particular groups and the fact that individual researchers currently submit annotated sequences to GenBank without significant external taxonomic validation. Our encouraging results suggest that the rapid uptake of DNA-based approaches is supported by a bioinformatic infrastructure capable of assessing both the losses to biodiversity caused by global change and the effectiveness of conservation efforts aimed at slowing or reversing these losses.

Keywords: environmental DNA; metabarcoding; taxonomic assignments.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interest.

Figures

Fig. 1.
Fig. 1.
Percentage of sequences in multisequence clusters for 13 protein and 2 ribosomal RNA-coding metazoan mitochondrial encoded genes. Clustering was performed on sequences retrieved from the GenBank BLAST nucleotide database using VSEARCH.
Fig. 2.
Fig. 2.
Estimated percentage of mislabeled metazoan sequences for 2 protein coding genes and 2 ribosomal RNA coding genes: (A) CO1; (B) Cytb; (C) 16S; and (D) 12S. Estimated minimum and maximum values are indicated for each taxonomic level. Calculations were made with and without the phyla Cnidaria and Porifera because they are known to have lower rates of evolution for these genes; however, these 2 groups account for only 0.6% and 0.09% of all sequences, respectively, and thus have a relatively minor influence on overall error estimates.
Fig. 3.
Fig. 3.
Number of sequences and estimated percentage of mislabeled sequences at the genus and family levels across major metazoan phyla: (A) CO1; (B) Cytb; (C) 16S; and (D) 12S. The category “Other phyla” includes sequences of Acanthocephala, Brachiopoda, Bryozoa, Chaetognatha, Ctenophora, Cycliophora, Entoprocta, Gastrotricha, Hemichordata, Kinorhyncha, Nematomorpha, Nemertea, Onychophora, Placozoa, Priapulida, Rhombozoa, Rotifera, Tardigrada, and Xenacoelomorpha.
Fig. 4.
Fig. 4.
Increase in the number of scientific publications since 2000 based on Web of Science using the terms “eDNA” or “environmental DNA” or “community DNA” or “metabarcod*” compared with the comparatively stable number of publications using the term term “DNA”.

Comment in

References

    1. Bellard C., Bertelsmeier C., Leadley P., Thuiller W., Courchamp F., Impacts of climate change on the future of biodiversity. Ecol. Lett. 15, 365–377 (2012). - PMC - PubMed
    1. Bohmann K., et al. , Environmental DNA for wildlife biology and biodiversity monitoring. Trends Ecol. Evol. 29, 358–367 (2014). - PubMed
    1. Hebert P. D. N., Cywinska A., Ball S. L., deWaard J. R., Biological identifications through DNA barcodes. Proc Biol Sci 270, 313–321 (2003). - PMC - PubMed
    1. Creer S., et al. , The ecologist’s field guide to sequence-based identification of biodiversity. Methods Ecol. Evol. 7, 1008–1018 (2016).
    1. Adamowicz S. J., et al. , Trends in DNA barcoding and metabarcoding. Genome 62, v–viii (2019). - PubMed

Publication types

LinkOut - more resources