Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Sep 1;30(17):i527-33.
doi: 10.1093/bioinformatics/btu477.

RidgeRace: ridge regression for continuous ancestral character estimation on phylogenetic trees

Affiliations

RidgeRace: ridge regression for continuous ancestral character estimation on phylogenetic trees

Christina Kratsch et al. Bioinformatics. .

Abstract

Motivation: Ancestral character state reconstruction describes a set of techniques for estimating phenotypic or genetic features of species or related individuals that are the predecessors of those present today. Such reconstructions can reach into the distant past and can provide insights into the history of a population or a set of species when fossil data are not available, or they can be used to test evolutionary hypotheses, e.g. on the co-evolution of traits. Typical methods for ancestral character state reconstruction of continuous characters consider the phylogeny of the underlying data and estimate the ancestral process along the branches of the tree. They usually assume a Brownian motion model of character evolution or extensions thereof, requiring specific assumptions on the rate of phenotypic evolution.

Results: We suggest using ridge regression to infer rates for each branch of the tree and the ancestral values at each inner node. We performed extensive simulations to evaluate the performance of this method and have shown that the accuracy of its reconstructed ancestral values is competitive to reconstructions using other state-of-the-art software. Using a hierarchical clustering of gene mutation profiles from an ovarian cancer dataset, we demonstrate the use of the method as a feature selection tool.

Availability and implementation: The algorithm described here is implemented in C++ as a stand-alone program, and the source code is freely available at http://algbio.cs.uni-duesseldorf.de/software/RidgeRace.tar.gz.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Model of phenotype evolution on a phylogenetic tree. The observed continuous character values at the nodes yi are the result of a sum of contributions on ancestral branches. A virtual branch ‘above’ the root node x1 contributes the global phylogenetic mean, i.e. the ancestral state of x1
Fig. 2.
Fig. 2.
Mean squared error between the inferred ancestral characters and the true simulated values, when using maximum likelihood reconstruction (yellow), generalized least squares (red) and RidgeRace (light blue). The plot shows (a) the dependence of performance on the standard deviation σ of the BM process or (b) performance when increasing the number of leaf nodes in the tree
Fig. 3.
Fig. 3.
Reconstruction of the phenotypic rates β along the branches of a random tree with 25 leaves, simulated with three regimes and a hypothetical phenotypic trait that resulted from a BM process with original mean zero and standard deviations σI=5.3, σII=1.3 and σIII=2.3 in regimes I, II and III. The inferred rates visualize the speed of phenotypic evolution from strongly decreasing (red) to strongly increasing (blue). Absolute phenotypic rates are clearly largest in the regime with the highest σ parameter
Fig. 4.
Fig. 4.
Application of RidgeRace to a hierarchical clustering on somatic mutations inferred for an ovarian cancer dataset. Colors on the side of the tree indicate the subtypes inferred with network-based stratification (Hofree et al., 2013). Branches are colored according to the phenotypic rate parameter β; the thickness of branches is proportional to the number of nodes below them. Branches leading directly to leaf nodes were colored gray for improved visibility. Labels m1 to m5 indicate branches with strong changes in patient survival time. Changes in the absence or presence of mutations in the selected genes are indicated on all branches with four or more children

References

    1. Blomberg SP, et al. Testing for phylogenetic signal in comparative data: behavioral traits are more labile. Evolution. 2003;57:717–745. - PubMed
    1. Boettiger C, et al. Is your phylogeny informative? Measuring the power of comparative methods. Evolution. 2012;66:2240–2251. - PMC - PubMed
    1. BOOST (2014). ublas library. http://www.boost.org.
    1. Butler MA, King AA. Phylogenetic comparative analysis: a modeling approach for adaptive evolution. Am. Nat. 2004;164:683–695. - PubMed
    1. Cancer Genome Atlas Research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature. 2008;455:1061–1068. - PMC - PubMed

Publication types