Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Dec 1;30(23):3317-24.
doi: 10.1093/bioinformatics/btu530. Epub 2014 Aug 7.

Quartet inference from SNP data under the coalescent model

Affiliations

Quartet inference from SNP data under the coalescent model

Julia Chifman et al. Bioinformatics. .

Abstract

Motivation: Increasing attention has been devoted to estimation of species-level phylogenetic relationships under the coalescent model. However, existing methods either use summary statistics (gene trees) to carry out estimation, ignoring an important source of variability in the estimates, or involve computationally intensive Bayesian Markov chain Monte Carlo algorithms that do not scale well to whole-genome datasets.

Results: We develop a method to infer relationships among quartets of taxa under the coalescent model using techniques from algebraic statistics. Uncertainty in the estimated relationships is quantified using the nonparametric bootstrap. The performance of our method is assessed with simulated data. We then describe how our method could be used for species tree inference in larger taxon samples, and demonstrate its utility using datasets for Sistrurus rattlesnakes and for soybeans.

Availability and implementation: The method to infer the phylogenetic relationship among quartets is implemented in the software SVDquartets, available at www.stat.osu.edu/∼lkubatko/software/SVDquartets.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Example four-taxon phylogeny. Split 12|34 is valid, as the subtree consisting of taxa 1 and 2 does not overlap the subtree consisting of taxa 3 and 4. The two non-valid splits for this tree are 13|24 and 14|23
Fig. 2.
Fig. 2.
Simulation results for the JC69 model. The top row gives the results for 5000 unlinked SNP sites and the bottom row gives the results for 10 genes with 500 sites each. The columns correspond to differing branch lengths in the model species tree. The first boxplot in each subfigure shows the distribution of SVD scores for the true split, while the next two boxplots show the distribution for the two false splits
Fig. 3.
Fig. 3.
Simulation results for the GTR + I + Γ model. The top row gives results for 5000 unlinked SNP sites and the bottom row gives the results for 10 genes with 500 sites each. The columns correspond to differing branch lengths in the model species tree. The first boxplot in each subfigure shows the distribution of SVD scores for the true split, while the next two boxplots show the distribution for the two false splits
Fig. 4.
Fig. 4.
Bootstrap results for the JC69 model simulations. Each boxplot shows the distribution of the bootstrap support values for each of the three possible splits for the simulated data shown in Figure 2
Fig. 5.
Fig. 5.
Bootstrap results for the GTR + I + Γ simulations. Each boxplot shows the distribution of the bootstrap support values for each of the three possible splits for the simulated data shown in Figure 3
Fig. 6.
Fig. 6.
Simulation results for data consisting of 1000, 5000 or 10 000 unlinked SNP sites for trees with branch lengths of 0.5 coalescent units (solid lines), 1.0 coalescent units (dashed lines) or 2.0 coalescent units (dotted lines). The median SVD score (taken over 1000 replicates) for the valid split 12|34 are marked with circles, while the scores for the two non-valid splits are marked with triangles and diamonds.
Fig. 7.
Fig. 7.
Results of the analysis of the rattlesnake data. In (a), the tree relating all 52 lineages is shown. Colors indicate subspecies membership: Scc = S. c. catenatus (green); Sce = S. c. edwardsii (red); Sct = S. c. tergeminus (blue); Smm = S. m. miliarius (dark green); Sms = S. m. streckeri (orange); Smb = S. m. barbouri (dark blue); Apc = A. piscivorus (black) and Akc = A. contortrix (black). In (b), the tree relating subspecies is shown, with abbreviations as above, except that the two outgroup species have been combined and denoted ‘Ag’. In both subfigures, numbers above the nodes refer to bootstrap support values, and the trees depicted are majority-rule consensus trees over 100 bootstrap samples
Fig. 8.
Fig. 8.
Results of the analysis of the soybean data. (a) Tree estimated by SVDquartets with bootstrap support values. (b) Maximum clade credibility tree estimated using SNAPP

References

    1. Allman ES, Rhodes JA. Phylogenetic ideals and varieties for the general Markov model. Adv. Appl. Math. 2008;40 arXiv:math.AG/0410604.
    1. Bryant D, et al. Inferring species trees directly from biallelic genetic markers: bypassing gene trees in a full coalescent analysis. Mol. Biol. Evol. 2012;29:1917–1932. - PMC - PubMed
    1. Chifman J, Kubatko LS. Identifiability of the unrooted species tree topology under the coalescent model with time-reversible substitution processes. 2014 http://arxiv.org/abs/1406.4811. - PubMed
    1. Degnan J, Salter L. Gene tree distributions under the coalescent process. Evolution. 2005;59:24–37. - PubMed
    1. DeGeorgio M, Degnan J. Fast and consistent estimation of species trees using supermatrix rooted triples. Mol. Biol. Evol. 2010;27:552–569. - PMC - PubMed

Publication types