Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Mar 1;66(2):152-166.
doi: 10.1093/sysbio/syw066.

Toward a Self-Updating Platform for Estimating Rates of Speciation and Migration, Ages, and Relationships of Taxa

Affiliations

Toward a Self-Updating Platform for Estimating Rates of Speciation and Migration, Ages, and Relationships of Taxa

Alexandre Antonelli et al. Syst Biol. .

Abstract

Rapidly growing biological data-including molecular sequences and fossils-hold an unprecedented potential to reveal how evolutionary processes generate and maintain biodiversity. However, researchers often have to develop their own idiosyncratic workflows to integrate and analyze these data for reconstructing time-calibrated phylogenies. In addition, divergence times estimated under different methods and assumptions, and based on data of various quality and reliability, should not be combined without proper correction. Here we introduce a modular framework termed SUPERSMART (Self-Updating Platform for Estimating Rates of Speciation and Migration, Ages, and Relationships of Taxa), and provide a proof of concept for dealing with the moving targets of evolutionary and biogeographical research. This framework assembles comprehensive data sets of molecular and fossil data for any taxa and infers dated phylogenies using robust species tree methods, also allowing for the inclusion of genomic data produced through next-generation sequencing techniques. We exemplify the application of our method by presenting phylogenetic and dating analyses for the mammal order Primates and for the plant family Arecaceae (palms). We believe that this framework will provide a valuable tool for a wide range of hypothesis-driven research questions in systematics, biogeography, and evolution. SUPERSMART will also accelerate the inference of a "Dated Tree of Life" where all node ages are directly comparable. [Bayesian phylogenetics; data mining; divide-and-conquer methods; GenBank; multilocus multispecies coalescent; next-generation sequencing; palms; primates; tree calibration.].

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Methods for inferring large (dated) phylogenies. Schematic comparison of the supertree, supermatrix, and the SUPERSMART approaches.
Figure 2.
Figure 2.
Basic overview of the three-step approach implemented in SUPERSMART. a) A backbone tree is inferred for four hypothetical genera (A, B, C, and D), each represented by two exemplar species. The backbone is calibrated using a fossil on the node indicated with a star (which may have an own confidence interval). In this example, two genera (C and D) appear in this analysis to be polyphyletic. b) The backbone tree is decomposed into three sets of taxa (red, blue and green) containing all the intrageneric taxa for which sufficient data are available. Genera C and D are merged into one taxon set because their exemplars were resolved as polyphyletic. Each taxon set is analyzed separately, yielding the trees shown. Hypothetical genus B shows that exemplars sometimes form an ingroup when more taxa are added; the pipeline attempts to minimize occurrences of this by picking exemplars with high sequence distance to one another. The clade trees have relative node ages and are scaled so that the most recent common ancestors of the respective exemplars have the same age as the equivalent nodes in the backbone (ages are indicated by the dotted lines). High posterior density intervals, indicated with gradients, are similarly scaled. c) The final tree is obtained by grafting. Note how the branch leading up to genus B is shortened to make room for B3, whose age has been scaled in proportion to the ratio of the ages of B1, B2 in the backbone and the clade tree. Note also how the highest posterior density (HPD) intervals have become proportionally larger, for example, on the root of genus B. The combined clade-level analysis resolved reciprocally monophyletic genera C and D without the use of constraints.
Figure 3.
Figure 3.
Illustration of the classic knapsack problem applied to the optimal choice of species and alignments (markers) for compiling DNA alignments. Seven exemplar species (S1–S7) are put in ascending order by their occurrence in the candidate alignments (A1–A7) which are in turn ordered by taxon coverage. In this example, the minimum number of alignments per species is set to two. The supermatrix is then compiled as described in the text. The resulting matrix consists of five alignments and only six species, since the number of alignments in which species S4 occurs does not meet the required minimum.
Figure 4.
Figure 4.
Time-calibrated phylogenies of (a) the mammal order Primates (primates) and (b) the plant family Arecaceae (palms) inferred using SUPERSMART. The families in (a) and the subfamilies in (b) are outlined. Internal concentric circles represent 10 myr bins. See Supplementary Figures S3 and S4 available on Dryad for fully annotated trees.
Figure 5.
Figure 5.
Results from the biogeographic analyses of palms. a) Bioregionalisation analysis based on ca. 724,000 species occurrence records, highlighting the two regions analyzed below. b) Relative number of dispersal events (or range expansions) in proportion to the number of lineages in the phylogeny in which such events could have taken place between Northern South America and Central America (as one area) and eastern South America, and between east and west of Wallace’s Line. c) A similar analysis as in (b), but showing the absolute number of events. See text for details on the analysis.
Figure 6.
Figure 6.
Validation of the three-step phylogenetic inference process. a) Comparison of the molecular data for the primate tree inference and the replicated data set obtained from sequence simulations. Boxes show the interquartile range of each property for all alignments and its median as a black line. The ends of whiskers represent the lowest and highest value within 1.5 times the interquartile range of the lower and upper quartile, respectively. Gray and white boxes show real and simulated data, respectively. (b) Simulated tree (left) matched with the tree that was re-estimated from the synthetic data set using SUPERSMART. Species present in both trees are connected by lines which are color coded by the subclades that the backbone tree was decomposed into. Branches in the re-estimated tree that form the backbone are colored in red. A comparison of fully annotated trees is shown in Supplementary Figure S8 available on Dryad.

References

    1. Aberer A.J., Kobert K., Stamatakis A.. 2014.. ExaBayes: massively parallel Bayesian tree inference for the whole-genome era. Mol. Biol. Evol. 31:2553–2556. - PMC - PubMed
    1. Aberer A.J., Krompass D., Stamatakis A.. 2013.. pruning rogue taxa improves phylogenetic accuracy: an efficient algorithm and webservice. Syst. Biol. 62:162–166. - PMC - PubMed
    1. Antonelli A. Forthcoming. Advancing biodiversity research: comparative biogeography, big data, and common myths. Sci. Danica.
    1. Antonelli A., Zizka A., Silvestro D., Scharn R., Cascales-Miñana B., Bacon C.D.. 2015.. An engine for global plant diversity: highest evolutionary turnover and emigration in the American tropics. Front. Genet. 6:130. - PMC - PubMed
    1. Bacon C.D., Michonneau F., Henderson A.J. McKenna M.J., Milroy A.M., Simmons M.P.. 2013.. Geographic and taxonomic disparities in species diversity: dispersal and diversification rates across wallace’s line. Evolution 67:2058–2071. - PubMed