Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 Nov;75(5):771-89.
doi: 10.1086/425281. Epub 2004 Sep 22.

Statistical tests for admixture mapping with case-control and cases-only data

Affiliations

Statistical tests for admixture mapping with case-control and cases-only data

Giovanni Montana et al. Am J Hum Genet. 2004 Nov.

Abstract

Admixture mapping is a promising new tool for discovering genes that contribute to complex traits. This mapping approach uses samples from recently admixed populations to detect susceptibility loci at which the risk alleles have different frequencies in the original contributing populations. Although the idea for admixture mapping has been around for more than a decade, the genomic tools are only now becoming available to make this a feasible and attractive option for complex-trait mapping. In this article, we describe new statistical methods for analyzing multipoint data from admixture-mapping studies to detect "ancestry association." The new test statistics do not assume a particular disease model; instead, they are based simply on the extent to which the sample's ancestry proportions at a locus deviate from the genome average. Our power calculations show that, for loci at which the underlying risk-allele frequencies are substantially different in the ancestral populations, the power of admixture mapping can be comparable to that of association mapping but with a far smaller number of markers. We also show that, although "ancestry informative markers" (AIMs) are superior to random single-nucleotide polymorphisms (SNPs), random SNPs can perform quite well when AIMs are not available. Hence, researchers who study admixed populations in which AIMs are not available can perform admixture mapping with the use of modestly higher densities of random markers. Software to perform the gene-mapping calculations, "MALDsoft," is freely available on the Pritchard Lab Web site.

PubMed Disclaimer

Figures

Figure  1
Figure 1
Schematic figure showing the mosaic structure of chromosomes in an admixed population. The shaded and unshaded boxes indicate chromosomal segments derived from different ancestral populations. If a susceptibility allele is at a higher frequency in the shaded population, then affected individuals will have increased ancestry from the shaded population at the locus of that gene (vertical line). Our method aims to detect this type of signal.
Figure  2
Figure 2
Conditional independence structure of the data along a single chromosome. The chromosome is composed of a series of segments, each derived from one of the contributing populations. The zs indicate the population of origin of each marker along the chromosome; the sequence of zs forms a Markov chain with jump rate r. The genotype data (the Xs) are generated by drawing an allele at random from the appropriate population frequencies, given the zs. The genetic map distance between markers 1 and 2 is denoted d1. The model for diploid unphased data is analogous.
Figure  3
Figure 3
Reconstruction of locus-specific ancestry for a single individual, using AIMs. The top plot shows the “true” simulated ancestry of a single individual (i.e., whether the individual has 0, 1, or 2 chromosomes inherited from population 1, as a function of position along a chromosome). The lower plots show the posterior mean estimates for this individual on the basis of marker data at different densities, as well as with and without known haplotype phase. These data were simulated under the assumption of an admixture time of 10 generations before the present.
Figure  4
Figure 4
Reconstruction of locus-specific ancestry for a single individual, using random SNPs with average FST=0.1 between the two ancestral populations. See the legend to figure 3 and the “Simulation Details” section for more information.
Figure  5
Figure 5
Accuracy of locus-specific ancestry estimation as a function of marker density. The X-axis shows the number of SNPs per cM, and the Y-axis shows the MSE in the estimation of formula image. The three lines correspond to an average FST between the ancestral populations of 0.1 (top line) and 0.2 (middle line) and to AIMs with δ=0.5 (bottom line). The data were treated as unphased. The values at zero density show the MSE when q(i) is known, but there is no additional information about Z at the locus of interest. These data were simulated under the assumption of an admixture time of 10 generations before the present, with mean q(i)=0.2 (see the “Simulation Details” section for specifics).
Figure  6
Figure 6
Plots of average ancestry in a sample, as a function of chromosomal location. The gray lines plot the true values, and the black lines plot the estimated averages for cases (top), controls (middle), and the difference in the averages (bottom). The vertical dashed lines indicate the location of a simulated disease gene. Parameters: 800 cases, 800 controls, 200 learning samples, and 500 AIMs at a spacing of 2 cM.
Figure  7
Figure 7
Plots of the test-statistic values, as a function of chromosomal location. The gray line plots T1 (cases only), and the black line plots T2 (cases vs. controls). The vertical dashed line indicates the location of a disease gene. As is typical, the signal in this example is larger when the cases-only test is used. The genotype data are the same as those used in figure 6.
Figure  8
Figure 8
Simulated distributions of the test statistics under the null and alternative hypotheses for the cases-only (black line) and case-control (gray line) strategies. The dotted lines show the theoretical normal density. Parameters: 100 AIMs at a spacing of 2 cM, 350 cases, and 350 controls. See the “Simulation Details” section for further details on the simulations.
Figure  9
Figure 9
Distribution over replicate simulations of the most extreme value of the test statistic T1 in a 400-cM region with no disease loci. This type of simulated distribution can be used to quantify empirical genomewide significance for the most extreme signals observed in a data set.
Figure  10
Figure 10
Mapping results for a simulated genome scan of 500 cases and 500 controls, with four true disease loci. The upper and lower plots show results for the cases-only and case-control tests, respectively. The four large upward peaks on each plot correspond to the four simulated disease loci; for most of the remainder of the genome, the test statistics lie within the dotted lines at ±1.96, corresponding to the central 95% of the null distribution.

References

Electronic-Database Information

    1. Pritchard Lab Web site, http://pritch.bsd.uchicago.edu

References

    1. Akey JM, Zhang G, Zhang K, Jin L, Shriver MD (2002) Interrogating a high-density SNP map for signatures of natural selection. Genome Res 12:1805–181410.1101/gr.631202 - DOI - PMC - PubMed
    1. Chakraborty R, Weiss KM (1988) Admixture as a tool for finding linked genes and detecting that difference from allelic association between loci. Proc Natl Acad Sci USA 85:9119–9123 - PMC - PubMed
    1. Collins-Schramm HE, Phillips CM, Operario DJ, Lee JS, Weber JL, Hanson RL, Knowler WC, Cooper R, Li H, Seldin MF (2002) Ethnic-difference markers for use in mapping by admixture linkage disequilibrium. Am J Hum Genet 70:737–750 - PMC - PubMed
    1. Efron B (2004) Large-scale simultaneous hypothesis testing: the choice of a null hypothesis. J Am Stat Assoc 99:96–104
    1. Falush D, Stephens M, Pritchard JK (2003a) Inference of population structure: extensions to linked loci and correlated allele frequencies. Genetics 164:1567–1587 - PMC - PubMed

Publication types

Substances