Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Oct;2(5):437-445.
doi: 10.1111/j.2041-210X.2011.00102.x.

Measuring the Temporal Structure in Serially-Sampled Phylogenies

Affiliations

Measuring the Temporal Structure in Serially-Sampled Phylogenies

R R Gray et al. Methods Ecol Evol. 2011 Oct.

Abstract

Nucleotide sequences sampled at different times (serially-sampled sequences) allow researchers to study the rate of evolutionary change and the demographic history of populations. Some phylogenies inferred from serially-sampled sequences are described as having strong 'temporal clustering', such that sequences from the same sampling time tend to to cluster together and to be the direct ancestors of sequences from the following sampling time. The degree to which phylogenies exhibit these properties is thought to reflect interesting biological processes, such as positive selection or deviation from the molecular clock hypothesis.Here we introduce the Temporal Clustering (TC) statistic, which is the first quantitative measure of the degree of topological 'temporal clustering' in a serially-sampled phylogeny. The TC statistic represents the expected deviation of an observed phylogeny from the null hypothesis of no temporal clustering, as a proportion of the range of possible values, and can therefore be compared among phylogeny of different sizes.We apply the TC statistic to a range of serially-sampled sequence datasets, which represent both rapidly-evolving viruses and ancient mitochondrial DNA. In addition, the TC statistic was calculated for phylogenies simulated under a neutral coalescent process.Our results indicate significant temporal clustering in many empirical datasets. However, we also find that such clustering is exhibited by trees simulated under a neutral coalescent process; hence the observation of significant 'temporal clustering' cannot unambiguously indicate that presence of strong positive selection in a population.Quantifying topological structure in this manner will provide new insights into the evolution of measurably evolving populations.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
(a) and (b) Schematic phylogenies representing the tree topologies with the lowest (a) and highest (b) possible tree score (S) values. Terminal branches are colored according to the time point at which the hypothetical sample was taken (white = 1st, blue = 2nd, pink=3rd, orange = 4th). Internal branches are colored according the most parsimonious reconstruction of the ancestral state, according to the irreversible matrix used to calculate the TC statistic (see text). (c) A number line representing the parameter space of the tree score (S) value. The null distribution is shown as a bell curve with maximum value (Smax), minimum value (Smin) and mean (Smean) indicated. The absolute Minimum (Min) and Maximum (Max) tree scores are also indicated. The interval A denotes range of the null distribution. The interval B denotes the range of possible values (sample space). The interval C denotes the possible values for the numerator of the TC statistic.
Fig. 2
Fig. 2
The effect of varying tree shape, number of taxa and number of time points on the null distribution of the Temporal Clustering (TC) Statistic. (a-d) Four hypothetical topologies, with varying degrees of asymmetry, each containing 32 taxa sampled at 4 time points. Values in parentheses represent the range of tree asymmetry scores (the I statistic) for trees with N=4 to N=128 taxa. Branches are colored according to the indexed time of sampling, k=1 (earliest) to 4 (latest). (e) Plot showing the proportion of tree score parameter space occupied by the null distribution (see text) as a function of the number of taxa (with the number of time points held constant). The proportion was calculated for each of the four topologies (denoted by symbols). (f) Plot showing the proportion of parameter space occupied by the null distribution as a function of the number of timepoints (for trees containing 64 or 100 taxa).
Fig. 3
Fig. 3
Temporal Clustering (TC) values for the empirical datasets (a) HIV-1 intra-patient datasets (c) HCV subtype 1a and 1b datasets (e) bison ancient mtDNA dataset. TC values were calculated as described in the main text, for each of 200 topologies randomly sampled from the posterior distribution of trees, given the data. The TC values are plotted on the y-axis, with the height of the bar indicating the average TC and the error bars representing ±2 SD among the 200 topologies. (b) (d) and (f) Representative rooted maximum clade credibility (MCC) trees for the HIV-1, HCV and mtDNA datasets, respectively (p5 for HIV-1; E1E2 subtype 1a for HCV). Branch lengths are units of substitutions/site (see scale bar). Branches are colored using a rainbow spectrum according to the actual (or reconstructed) time point, where red = earliest timepoint and purple = latest timepoint.
Fig. 4
Fig. 4
Temporal Clustering (TC) values for datasets simulated under a neutral coalescent process. Simulations were performed under four demographic models: constant population size (C), exponential growth (E), logistic growth (L) and sinusoidal change (S). In each case, trees were simulated under the sampling times of the corresponding empirical data set: (a) 9 intra-host HIV-1 datasets, (c) HCV subtype 1a and 1b datasets (e) bison ancient mtDNA data set. For each dataset/model combination, 100 coalescent trees were simulated, and the TC value of each tree was calculated as described in the main text. The TC values are plotted on the y-axis, with the height of the bar indicating the average TC and the error bars representing ±2 SD among the 100 simulations. (b) (d) and (f) Representative rooted maximum clade credibility (MCC) trees for the simulated HIV-1, HCV and mtDNA datasets, respectively (p5 for HIV-1; E1E2 subtype 1a for HCV). Branch lengths are units of years (see scale bar). Branches are colored using a rainbow spectrum according to the actual (or reconstructed) time point, where red = earliest timepoint and purple = latest timepoint.

Similar articles

Cited by

References

    1. Agapow P, Purvis A. Power of eight tree shape statistics to detect nonrandom diversification: a comparison by simulation of two models of cladogenesis. Syst Biol. 2002;51:866–872. - PubMed
    1. Bush R, Bender C, Subbarao K, Cox N, Fitch W. Predicting the evolution of human influenza A. Science. 1999;286:1921–1925. - PubMed
    1. Colless DH. Review of Phylogenetics: The Theory and Practice of Phylogenetic Systematics. Syst Zool. 1982;31:100–104.
    1. Drummond A, Pybus OG, Rambaut A. Inference of viral evolutionary rates from molecular sequences. Adv Parasitol. 2003;54:331–358. - PubMed
    1. Drummond AJ, Pybus OG, Rambaut A, Forsberg R, Rodrigo AG. Measurably evolving populations. Trends in Ecology and Evolution. 2003;18:481–488.

LinkOut - more resources