Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 Sep;60(3):589-97.
doi: 10.1111/j.0006-341X.2004.00207.x.

Two-stage designs for gene-disease association studies with sample size constraints

Affiliations

Two-stage designs for gene-disease association studies with sample size constraints

Jaya M Satagopan et al. Biometrics. 2004 Sep.

Abstract

Gene-disease association studies based on case-control designs may often be used to identify candidate polymorphisms (markers) conferring disease risk. If a large number of markers are studied, genotyping all markers on all samples is inefficient in resource utilization. Here, we propose an alternative two-stage method to identify disease-susceptibility markers. In the first stage all markers are evaluated on a fraction of the available subjects. The most promising markers are then evaluated on the remaining individuals in Stage 2. This approach can be cost effective since markers unlikely to be associated with the disease can be eliminated in the first stage. Using simulations we show that, when the markers are independent and when they are correlated, the two-stage approach provides a substantial reduction in the total number of marker evaluations for a minimal loss of power. The power of the two-stage approach is evaluated when a single marker is associated with the disease, and in the presence of multiple disease-susceptibility markers. As a general guideline, the simulations over a wide range of parametric configurations indicate that evaluating all the markers on 50% of the individuals in Stage 1 and evaluating the most promising 10% of the markers on the remaining individuals in Stage 2 provides near-optimal power while resulting in a 45% decrease in the total number of marker evaluations.

Pour explorer l’association entre un gène et une maladie, les études d’association de type cas/témoins sont souvent utilisées pour identifier des polymorphismes (marqueurs) qui confèrent un risque accru de développer la maladie. Si un grand nombre de marqueurs sont étudiées, génotyper tous les marqueurs chez tous les individus de l’échantillon n’est pas optimal en terme d’utilisation des ressources. Ici, nous proposons une approche alternative en deux étapes pour identifier les marqueurs associés à une maladie. Lors de la première étape, tous les marqueurs sont génotypés chez une fraction de l’échantillon total. Les marqueurs les plus prometteurs sont ensuite testés sur les individus restants lors de la deuxième étape. Cette approche peut être efficace en terme de coût puisque les marqueurs vraisemblablement non associés sont éliminés lors de la première étape. A l’aide de simulations, nous montrons que lorsque les marqueurs sont indépendants ou lorsqu’ils sont corrélés, l’approche en deux étapes conduit à une réduction substantielle dans le nombre total de marqueurs évalués pour une perte de puissance minimale. La puissance de notre approche est évaluée lorsqu’un seul marqueur est associé avec la maladie et en présence de plusieurs marqueurs associés à la maladie. De façon générale, nos simulations indiquent que génotyper l’ensemble des marqueurs chez 50% des individus lors de la première étape, puis de génotyper les 10% des marqueurs les plus prometteurs est quasi optimal en terme de puissance pour une diminution de 45% dans le nombre total de marqueurs génotypés.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Power of a two-stage design for m independent markers as a function of the proportion of markers (i) carried over to Stage 2 for various values of T2/T1. There is a single disease locus (D = 1). The signal μ is chosen such that the one-stage design has 80% power to identify the disease locus. The sample size is n = 1000. Dashed lines correspond to m = 100, and solid lines correspond to m = 3000 markers. The pairs of curves (for m = 100 and 3000) given from bottom to top correspond to the following values of T2/T1: 0.25, 0.30, 0.35, 0.40, 0.45, 0.50, 0.55, and 0.60, respectively. The cost fraction corresponding to every pair of curves is indicated in the figure. For example, the top pair of curves gives the power of a two-stage design for m = 100 (dashed line) and 3000 (solid line) when T2/T1 = 0.60 for various values of i (shown in the horizontal axis).
Figure 2.
Figure 2.
The maximum power of a two-stage design as a function of the proportion of marker evaluations (T2/T1) for m = 3000 (solid), 1000 (dot), 500 (dash), 200 (dash-dot), and 100 (long dash) independent markers, for sample size n = 1000. Power is shown for D = 1 single disease locus. The signal μ is such that the corresponding one-stage designs have 80% power to identify the disease locus.

References

    1. Abecasis GR, Noguchi E, Heinzmann A, et al. (2001). Extent and distribution of linkage disequilibrium in three genomic regions. American Journal of Human Genetics 68, 191–197. - PMC - PubMed
    1. Ardlie KG, Kruglyak L, and Seielstad M (2002). Patterns of linkage disequilibrium in the human genome. Nature Reviews Genetics 3, 299–309. - PubMed
    1. Armitage P (1955). Tests for linear trends in proportions and frequencies. Biometrics 11, 375–386.
    1. Boehnke M (1994). Limits of resolution of genetic linkage studies: Implications for the positional cloning of human genetic diseases. American Journal of Human Genetics 55, 379–390. - PMC - PubMed
    1. Devlin B and Risch N (1995). A comparison of linkage disequilibrium measures for fine-scale mapping. Genomics 29, 311–322. - PubMed

Publication types

Substances