. 2004 Sep;60(3):589-97.

doi: 10.1111/j.0006-341X.2004.00207.x.

Two-stage designs for gene-disease association studies with sample size constraints

Jaya M Satagopan¹, E S Venkatraman, Colin B Begg

Affiliations

PMID: 15339280
PMCID: PMC8985053
DOI: 10.1111/j.0006-341X.2004.00207.x

Two-stage designs for gene-disease association studies with sample size constraints

Jaya M Satagopan et al. Biometrics. 2004 Sep.

. 2004 Sep;60(3):589-97.

doi: 10.1111/j.0006-341X.2004.00207.x.

Authors

Jaya M Satagopan¹, E S Venkatraman, Colin B Begg

Affiliation

¹ Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, New York, NY 10021, USA. satagopj@mskcc.org

PMID: 15339280
PMCID: PMC8985053
DOI: 10.1111/j.0006-341X.2004.00207.x

Abstract
in English, French

Gene-disease association studies based on case-control designs may often be used to identify candidate polymorphisms (markers) conferring disease risk. If a large number of markers are studied, genotyping all markers on all samples is inefficient in resource utilization. Here, we propose an alternative two-stage method to identify disease-susceptibility markers. In the first stage all markers are evaluated on a fraction of the available subjects. The most promising markers are then evaluated on the remaining individuals in Stage 2. This approach can be cost effective since markers unlikely to be associated with the disease can be eliminated in the first stage. Using simulations we show that, when the markers are independent and when they are correlated, the two-stage approach provides a substantial reduction in the total number of marker evaluations for a minimal loss of power. The power of the two-stage approach is evaluated when a single marker is associated with the disease, and in the presence of multiple disease-susceptibility markers. As a general guideline, the simulations over a wide range of parametric configurations indicate that evaluating all the markers on 50% of the individuals in Stage 1 and evaluating the most promising 10% of the markers on the remaining individuals in Stage 2 provides near-optimal power while resulting in a 45% decrease in the total number of marker evaluations.

Pour explorer l’association entre un gène et une maladie, les études d’association de type cas/témoins sont souvent utilisées pour identifier des polymorphismes (marqueurs) qui confèrent un risque accru de développer la maladie. Si un grand nombre de marqueurs sont étudiées, génotyper tous les marqueurs chez tous les individus de l’échantillon n’est pas optimal en terme d’utilisation des ressources. Ici, nous proposons une approche alternative en deux étapes pour identifier les marqueurs associés à une maladie. Lors de la première étape, tous les marqueurs sont génotypés chez une fraction de l’échantillon total. Les marqueurs les plus prometteurs sont ensuite testés sur les individus restants lors de la deuxième étape. Cette approche peut être efficace en terme de coût puisque les marqueurs vraisemblablement non associés sont éliminés lors de la première étape. A l’aide de simulations, nous montrons que lorsque les marqueurs sont indépendants ou lorsqu’ils sont corrélés, l’approche en deux étapes conduit à une réduction substantielle dans le nombre total de marqueurs évalués pour une perte de puissance minimale. La puissance de notre approche est évaluée lorsqu’un seul marqueur est associé avec la maladie et en présence de plusieurs marqueurs associés à la maladie. De façon générale, nos simulations indiquent que génotyper l’ensemble des marqueurs chez 50% des individus lors de la première étape, puis de génotyper les 10% des marqueurs les plus prometteurs est quasi optimal en terme de puissance pour une diminution de 45% dans le nombre total de marqueurs génotypés.

PubMed Disclaimer

Figures

**Figure 1.**
Power of a two-stage design for m independent markers as a function of the proportion of markers (i) carried over to Stage 2 for various values of T₂/T₁. There is a single disease locus (D = 1). The signal μ is chosen such that the one-stage design has 80% power to identify the disease locus. The sample size is n = 1000. Dashed lines correspond to m = 100, and solid lines correspond to m = 3000 markers. The pairs of curves (for m = 100 and 3000) given from bottom to top correspond to the following values of T₂/T₁: 0.25, 0.30, 0.35, 0.40, 0.45, 0.50, 0.55, and 0.60, respectively. The cost fraction corresponding to every pair of curves is indicated in the figure. For example, the top pair of curves gives the power of a two-stage design for m = 100 (dashed line) and 3000 (solid line) when T₂/T₁ = 0.60 for various values of i (shown in the horizontal axis).

**Figure 2.**
The maximum power of a two-stage design as a function of the proportion of marker evaluations (T₂/T₁) for m = 3000 (solid), 1000 (dot), 500 (dash), 200 (dash-dot), and 100 (long dash) independent markers, for sample size n = 1000. Power is shown for D = 1 single disease locus. The signal μ is such that the corresponding one-stage designs have 80% power to identify the disease locus.

See this image and copyright information in PMC

References

1. Abecasis GR, Noguchi E, Heinzmann A, et al. (2001). Extent and distribution of linkage disequilibrium in three genomic regions. American Journal of Human Genetics 68, 191–197. - PMC - PubMed
1. Ardlie KG, Kruglyak L, and Seielstad M (2002). Patterns of linkage disequilibrium in the human genome. Nature Reviews Genetics 3, 299–309. - PubMed
1. Armitage P (1955). Tests for linear trends in proportions and frequencies. Biometrics 11, 375–386.
1. Boehnke M (1994). Limits of resolution of genetic linkage studies: Implications for the positional cloning of human genetic diseases. American Journal of Human Genetics 55, 379–390. - PMC - PubMed
1. Devlin B and Risch N (1995). A comparison of linkage disequilibrium measures for fine-scale mapping. Genomics 29, 311–322. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Two-stage designs for gene-disease association studies with sample size constraints

Affiliation

Two-stage designs for gene-disease association studies with sample size constraints

Authors

Affiliation

Abstract
in English, French

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Research Materials

Abstract in English, French

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Research Materials

Abstract
in English, French