Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2009 Nov 10;10 Suppl 14(Suppl 14):S10.
doi: 10.1186/1471-2105-10-S14-S10.

DNA barcode analysis: a comparison of phylogenetic and statistical classification methods

Affiliations
Comparative Study

DNA barcode analysis: a comparison of phylogenetic and statistical classification methods

Frederic Austerlitz et al. BMC Bioinformatics. .

Abstract

Background: DNA barcoding aims to assign individuals to given species according to their sequence at a small locus, generally part of the CO1 mitochondrial gene. Amongst other issues, this raises the question of how to deal with within-species genetic variability and potential transpecific polymorphism. In this context, we examine several assignation methods belonging to two main categories: (i) phylogenetic methods (neighbour-joining and PhyML) that attempt to account for the genealogical framework of DNA evolution and (ii) supervised classification methods (k-nearest neighbour, CART, random forest and kernel methods). These methods range from basic to elaborate. We investigated the ability of each method to correctly classify query sequences drawn from samples of related species using both simulated and real data. Simulated data sets were generated using coalescent simulations in which we varied the genealogical history, mutation parameter, sample size and number of species.

Results: No method was found to be the best in all cases. The simplest method of all, "one nearest neighbour", was found to be the most reliable with respect to changes in the parameters of the data sets. The parameter most influencing the performance of the various methods was molecular diversity of the data. Addition of genetically independent loci--nuclear genes--improved the predictive performance of most methods.

Conclusion: The study implies that taxonomists can influence the quality of their analyses either by choosing a method best-adapted to the configuration of their sample, or, given a certain method, increasing the sample size or altering the amount of molecular diversity. This can be achieved either by sequencing more mtDNA or by sequencing additional nuclear genes. In the latter case, they may also have to modify their data analysis method.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Hypothetical representations of gene genealogies between two species and of some hypothetical mutation patterns between them. Individual A is the global MRCA of all individuals; individuals B and C are respectively the MRCA of the two derived species 1 and 2. Cases a, b and c correspond to reciprocal monophyly and case d to reciprocal paraphyly. In some cases of reciprocal monophyly, one mutation is diagnostic (a), while no mutation is diagnostic in other cases (b and c). A combination of mutation can also be sufficient to perform barcoding (c). Barcoding is also possible in the case of reciprocal paraphyly, by also using combinations of mutations that are specific to a given species (d).
Figure 2
Figure 2
Schematic representation of simulations for two species (nS = 2). It is assumed that the species split T generations ago. The thin lines represent the coalescent lineages and stars indicate the mutations that occurred along these lineages. For each species, we simulated n reference individuals and one additional individual, which was used to test the methods.
Figure 3
Figure 3
Illustration of our assignment technique for the phylogeny-based methods. X denotes the query sequence to assign, and individuals 1_x or 2_x belong respectively to species 1 or 2. In case A, the sister group (1_1, 1_3, 2_1) of X contains a majority of individuals of species 1, thus X is assigned to species 1. In case B, the sister group (1_1, 1_3, 2_1, 2_4) of X contains an equal number of individuals of species 1 and 2, thus we have to consider the sister group at the upper level (one node above), this group is (1_1, 1_3, 2_1, 2_4, 1_2) and contains a majority of species 1 individuals. X is thus assigned to species 1.

References

    1. Hebert PD, Ratnasingham S, deWaard JR. Barcoding animal life: cytochrome C oxidase subunit 1 divergences among closely related species. Proc Biol Sci. 2003;270:S96–9. doi: 10.1098/rsbl.2003.0025. - DOI - PMC - PubMed
    1. Seifert KA, Samson RA, Dewaard JR, Houbraken J, Levesque CA, Moncalvo JM, Louis-Seize G, Hebert PD. Prospects for fungus identification using CO1 DNA barcodes, with Penicillium as a test case. Proc Natl Acad Sci USA. 2007;104:3901–6. doi: 10.1073/pnas.0611691104. - DOI - PMC - PubMed
    1. Hajibabaei M, Janzen DH, Burns JM, Hallwachs W, Hebert PD. DNA barcodes distinguish species of tropical Lepidoptera. Proc Natl Acad Sci USA. 2006;103:968–71. doi: 10.1073/pnas.0510466103. - DOI - PMC - PubMed
    1. Hebert PD, Penton EH, Burns JM, Janzen DH, Hallwachs W. Ten species in one: DNA barcoding reveals cryptic species in the neotropical skipper butterfly Astraptes fulgerator. Proc Natl Acad Sci USA. 2004;101:14812–7. doi: 10.1073/pnas.0406166101. - DOI - PMC - PubMed
    1. Elias M, Hill RI, Willmott KR, Dasmahapatra KK, Brower AV, Mallet J, Jiggins CD. Limited performance of DNA barcoding in a diverse community of tropical butterflies. Proc R Soc B. 2007;274:2881–9. doi: 10.1098/rspb.2007.1035. - DOI - PMC - PubMed

Publication types

LinkOut - more resources