Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2011 May 12;366(1569):1410-24.
doi: 10.1098/rstb.2010.0311.

Controlling for non-independence in comparative analysis of patterns across populations within species

Affiliations
Review

Controlling for non-independence in comparative analysis of patterns across populations within species

Graham N Stone et al. Philos Trans R Soc Lond B Biol Sci. .

Abstract

How do we quantify patterns (such as responses to local selection) sampled across multiple populations within a single species? Key to this question is the extent to which populations within species represent statistically independent data points in our analysis. Comparative analyses across species and higher taxa have long recognized the need to control for the non-independence of species data that arises through patterns of shared common ancestry among them (phylogenetic non-independence), as have quantitative genetic studies of individuals linked by a pedigree. Analyses across populations lacking pedigree information fall in the middle, and not only have to deal with shared common ancestry, but also the impact of exchange of migrants between populations (gene flow). As a result, phenotypes measured in one population are influenced by processes acting on others, and may not be a good guide to either the strength or direction of local selection. Although many studies examine patterns across populations within species, few consider such non-independence. Here, we discuss the sources of non-independence in comparative analysis, and show why the phylogeny-based approaches widely used in cross-species analyses are unlikely to be useful in analyses across populations within species. We outline the approaches (intraspecific contrasts, generalized least squares, generalized linear mixed models and autoregression) that have been used in this context, and explain their specific assumptions. We highlight the power of 'mixed models' in many contexts where problems of non-independence arise, and show that these allow incorporation of both shared common ancestry and gene flow. We suggest what can be done when ideal solutions are inaccessible, highlight the need for incorporation of a wider range of population models in intraspecific comparative methods and call for simulation studies of the error rates associated with alternative approaches.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Sources of non-independence in population data. (a) Diagrammatic representation of the history of population splits and gene flow linking populations within species. This history results in three genetic contributions to measured population phenotypes, shown diagrammatically for a set of four populations: (i) contributions owing to shared common ancestry (represented by the colours of internal branches in the population tree), (ii) evolution specific to each population owing to selection and drift (represented by colour changes along terminal branches), and (iii) impacts of gene flow (exchange of migrants or gametes) between populations (indicated by arrows, for simplicity shown only for population 1). (b) Gene flow brings into a recipient population a subset of the genetic variation in source populations. Three source populations (1–3) contribute migrants to a recipient population (4). Imagine recipient population 4 has a higher value for a trait (distribution x in the frequency distribution diagram at right) under selection/drift equilibrium than the source populations (which, for simplicity, all share distribution y). Migration into population 4 followed by interbreeding displaces the trait value distribution for this population downwards to a new equilibrium (distribution z). The impact of gene flow is greatest when, relative to a recipient population, source populations have very different equilibrium trait distributions and contribute large numbers of migrants. Under such circumstances, the phenotypes measured in any population may be a poor guide to the selective forces acting on it. Migration effects must be accounted for before local selective effects can be estimated. (c) Population models assumed by different analytical approaches discussed in the text. Assumption of population independence implies no impact of either gene flow or history. This occurs when there is no gene flow and populations are either entirely unrelated (i) or influenced only by population-specific processes (ii), as might happen when selection acting on populations is so rapid and strong that ancestral states can be ignored. Analyses that incorporate only population history (iii) assume no gene flow, while analyses that incorporate only gene flow (iv) assume no population similarity through common ancestry.
Figure 2.
Figure 2.
Consequences of phylogenetic non-independence for inferring relationships between variables across populations. Consider four populations, with mean values for two variables (independent variable x and dependent response variable y) as shown at top right. Forgetting gene flow for the moment, if these populations are equally unrelated phylogenetically (a), data for them can be considered independent, and the relationship across all four populations is a positive correlation (b). However, imagine that populations 1–2 and 3–4 comprise two pairs of closely related populations (c). The high trait values shared by both 1 and 2 (and the low values shared by both 3 and 4) are likely not to be independent, but to reflect low divergence within each pair from a common ancestor with high and low trait values, respectively. Now the relationship between x and y is negative within each population pair (black lines in (d)), but positive when analysed across the ancestors of each population pair (red line). Each of these three relationships is phylogenetically independent. A different pattern of relationships among the same set of populations can generate diametrically opposing relationships between x and y, as shown in (e). Now the relationship within each species pair is positive (black fitted lines in (f), right), while the relationship across the ancestors of the two species pairs is negative. These issues pertain whether the populations are sampled in the wild or grown in a common garden or provenance trial.

Similar articles

Cited by

References

    1. Ives A. R., Zhu J. 2006. Statistics for correlated data: phylogenies, space, and time. Ecol. Appl. 16, 20–3210.1890/04-0702 (doi:10.1890/04-0702) - DOI - DOI - PubMed
    1. Phillimore A. B., Hadfield J. D., Jones O. R., Smithers R. J. 2010. Differences in spawning date between populations of common frog reveal local adaptation. Proc. Natl Acad. Sci. USA 107, 8292–829710.1073/pnas.0913792107 (doi:10.1073/pnas.0913792107) - DOI - DOI - PMC - PubMed
    1. Antonovics J. 1992. Toward community genetics. In Plant resistance to herbivores and pathogens: ecology, evolution, and genetics (eds Fritz R. S., Simms E. L.), pp. 426–449 Chicago, IL: University of Chicago Press
    1. Helfield J. M., Naiman R. J. 2001. Effects of salmon-derived nitrogen on riparian forest growth and implications for stream productivity. Ecology 82, 2403–240910.1890/0012-9658(2001)082[2403:EOSDNO]2.0.CO;2 (doi:10.1890/0012-9658(2001)082[2403:EOSDNO]2.0.CO;2) - DOI - DOI
    1. Whitham T. G., et al. 2006. A framework for community and ecosystem genetics: from genes to ecosystems. Nat. Rev. Genet. 7, 510–52310.1038/nrg1877 (doi:10.1038/nrg1877) - DOI - DOI - PubMed

Publication types