Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jan 20:10:16.
doi: 10.1186/1471-2148-10-16.

Data mining approach identifies research priorities and data requirements for resolving the red algal tree of life

Affiliations

Data mining approach identifies research priorities and data requirements for resolving the red algal tree of life

Heroen Verbruggen et al. BMC Evol Biol. .

Abstract

Background: The assembly of the tree of life has seen significant progress in recent years but algae and protists have been largely overlooked in this effort. Many groups of algae and protists have ancient roots and it is unclear how much data will be required to resolve their phylogenetic relationships for incorporation in the tree of life. The red algae, a group of primary photosynthetic eukaryotes of more than a billion years old, provide the earliest fossil evidence for eukaryotic multicellularity and sexual reproduction. Despite this evolutionary significance, their phylogenetic relationships are understudied. This study aims to infer a comprehensive red algal tree of life at the family level from a supermatrix containing data mined from GenBank. We aim to locate remaining regions of low support in the topology, evaluate their causes and estimate the amount of data required to resolve them.

Results: Phylogenetic analysis of a supermatrix of 14 loci and 98 red algal families yielded the most complete red algal tree of life to date. Visualization of statistical support showed the presence of five poorly supported regions. Causes for low support were identified with statistics about the age of the region, data availability and node density, showing that poor support has different origins in different parts of the tree. Parametric simulation experiments yielded optimistic estimates of how much data will be needed to resolve the poorly supported regions (ca. 103 to ca. 104 nucleotides for the different regions). Nonparametric simulations gave a markedly more pessimistic image, some regions requiring more than 2.8 105 nucleotides or not achieving the desired level of support at all. The discrepancies between parametric and nonparametric simulations are discussed in light of our dataset and known attributes of both approaches.

Conclusions: Our study takes the red algae one step closer to meaningful inclusion in the tree of life. In addition to the recovery of stable relationships, the recognition of five regions in need of further study is a significant outcome of this work. Based on our analyses of current availability and future requirements of data, we make clear recommendations for forthcoming research.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Data availability matrix. Graphical representation of our concatenated alignment, showing the availability of sequence data. The color of column and row headers indicate the amount of data available for that column or row. Green indicates high data availability, red indicates low data availability and yellow/orange represents intermediate data availability. The matrix density is 34% in a locus × OTU context and 35% in a character × OTU context. Numbers in cells indicate length of sequence in alignment, which may include gaps and/or exclude ambiguously aligned regions. Figure generated with the gDAM software http://www.phycoweb.net.
Figure 2
Figure 2
Red algal tree of life with current taxonomic classification. The tree was reconstructed using Bayesian phylogenetic inference of DNA data mined from GenBank (Figure 1). Branch colors indicate statistical support of the clades: whereas black branches are strongly supported, the orange parts of the tree are poorly resolved. Intermediate colors represent intermediate support (see gradient legend). Five poorly supported regions are indicated with gray boxes (A-E). Numbers at nodes indicate branch support given as bootstrap values from maximum likelihood analysis before the vertical bar and Bayesian posterior probabilities after the vertical bar. Values are only shown if they exceed 50 and 0.95, respectively.
Figure 3
Figure 3
Estimated data requirement for resolving the five poorly supported regions. Each graph shows how average bootstrap support increases as a function of alignment length for three types of simulations: nonparametric resampling of the empirical alignment (orange), parametric simulation of data (blue) and parametric simulation followed by introduction of missing data (gray). The approximate amount of data required to resolve a region can be derived for each simulation type by specifying a desired level of bootstrap support (e.g., the dashed line drawn at 80) and deducing the corresponding alignment length on the x-axis. Note that the x-axis uses a logarithmic scale. The lines connect the means of the five values of each condition.

References

    1. McMahon MM, Sanderson MJ. Phylogenetic supermatrix analysis of GenBank sequences from 2228 papilionoid legumes. Syst Biol. 2006;55:818–836. doi: 10.1080/10635150600999150. - DOI - PubMed
    1. Sanderson MJ, Boss D, Chen D, Cranston KA, Wehe A. The PhyLoTA browser: Processing GenBank for molecular phylogenetics research. Syst Biol. 2008;57:335–346. doi: 10.1080/10635150802158688. - DOI - PubMed
    1. Bininda-Emonds ORP, (Ed) Phylogenetic supertrees: Combining information to reveal the tree of life. Dordrecht: Kluwer; 2004.
    1. Piel WH. TreeBASE: A database of phylogenetic knowledge. 2009. http://www.phylo.org/treebase
    1. Burki F, Shalchian-Tabrizi K, Minge M, Skjaeveland A, Nikolaev SI, Jakobsen KS, Pawlowski J. Phylogenomics reshuffles the eukaryotic supergroups. PLoS ONE. 2007;2:e790. doi: 10.1371/journal.pone.0000790. - DOI - PMC - PubMed

Publication types

LinkOut - more resources