Discovery properties of genome-wide association signals from cumulatively combined data sets

Tiago V Pereira¹, Nikolaos A Patsopoulos, Georgia Salanti, John P A Ioannidis

Affiliations

PMID: 19808636
PMCID: PMC2800267
DOI: 10.1093/aje/kwp262

Meta-Analysis

Discovery properties of genome-wide association signals from cumulatively combined data sets

Tiago V Pereira et al. Am J Epidemiol. 2009.

. 2009 Nov 15;170(10):1197-206.

doi: 10.1093/aje/kwp262. Epub 2009 Oct 6.

Authors

Tiago V Pereira¹, Nikolaos A Patsopoulos, Georgia Salanti, John P A Ioannidis

Affiliation

¹ Laboratory of Genetics and Molecular Cardiology, Heart Institute (InCor), University of São Paulo Medical School, São Paulo, Brazil.

PMID: 19808636
PMCID: PMC2800267
DOI: 10.1093/aje/kwp262

Abstract

Genetic effects for common variants affecting complex disease risk are subtle. Single genome-wide association (GWA) studies are typically underpowered to detect these effects, and combination of several GWA data sets is needed to enhance discovery. The authors investigated the properties of the discovery process in simulated cumulative meta-analyses of GWA study-derived signals allowing for potential genetic model misspecification and between-study heterogeneity. Variants with null effects on average (but also between-data set heterogeneity) could yield false-positive associations with seemingly homogeneous effects. Random effects had higher than appropriate false-positive rates when there were few data sets. The log-additive model had the lowest false-positive rate. Under heterogeneity, random-effects meta-analyses of 2-10 data sets averaging 1,000 cases/1,000 controls each did not increase power, or the meta-analysis was even less powerful than a single study (power desert). Upward bias in effect estimates and underestimation of between-study heterogeneity were common. Fixed-effects calculations avoided power deserts and maximized discovery of association signals at the expense of much higher false-positive rates. Therefore, random- and fixed-effects models are preferable for different purposes (fixed effects for initial screenings, random effects for generalizability applications). These results may have broader implications for the design and interpretation of large-scale multiteam collaborative studies discovering common gene variants.

PubMed Disclaimer

Figures

**Figure 1.**
Cumulative power comparison among 3 main true genetic models by random-effects calculations (dominant in A, B, and C; log additive in D, E, and F; and recessive in G, H, and I) in meta-analyses of up to 30 data sets (with an average of 2,000 participants, a range of 1,000–3,000 in each, and a case-control ratio of 1.0), combining data from common gene variants (minor allele frequency, f = 0.4) with modest effect sizes (odds ratio = 1.3). In each panel, power is given under the correct and under misspecified genetic models. Models of analysis: dominant = squares; log additive (per allele risk) = triangles; and recessive = circles. Power is calculated by the proportion of simulated meta-analyses that exceed the threshold of P < 10⁻⁷. The region of *power desert* is illustrated by open symbols in panels C, E, F, and I. τ², between-study variance.

**Figure 2.**
Median bias in summary effect sizes (logarithm of the odds ratio) for different true underlying modes of inheritance (dominant in A, B, and C; log additive in D, E, and F; and recessive in G, H, and I) and misspecified genetic models under random-effects calculations. Models of analysis: dominant = squares; log additive (per allele risk) = triangles; and recessive = circles. Bias was calculated as the median ratio of the detected effect size and the true effect size computed from the set of meta-analyses showing statistically significant signals at P < 10⁻⁷. τ², between-study variance.

**Figure 3.**
Cumulative power comparison among 3 main true genetic models by fixed-effects calculations (dominant in A, B, and C; log additive in D, E, and F; and recessive in G, H, and I) in meta-analyses of up to 30 data sets (with an average of 2,000 participants, a range of 1,000–3,000 in each, and a case-control ratio of 1.0), combining data from common gene variants (minor allele frequency, f = 0.4) with modest effect sizes (odds ratio = 1.3). In each panel, power is given under the correct and under misspecified genetic models. Models of analysis: dominant = squares; log additive (per allele risk) = triangles; and recessive = circles. Power is calculated by the proportion of simulated meta-analyses that exceed the threshold of P < 10⁻⁷. τ², between-study variance.

**Figure 4.**
Bias in between-study variance (τ²) estimates for the set of meta-analyses that showed statistically significant results (P < 10⁻⁷) under different true underlying genetic models (dominant in A; log additive in B; and recessive in C) with heterogeneity (τ² = 0.05). Models of analysis: dominant = squares; log additive (per allele risk) = triangles; and recessive = circles.

See this image and copyright information in PMC

References

1. McCarthy MI, Abecasis GR, Cardon LR, et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet. 2008;9(5):356–369. - PubMed
1. Zeggini E, Scott LJ, Saxena R, et al. Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nat Genet. 2008;40(5):638–645. - PMC - PubMed
1. Zeggini E, Weedon MN, Lindgren CM, et al. Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science. 2007;316(5829):1336–1341. - PMC - PubMed
1. Seminara D, Khoury MJ, O'Brien TR, et al. The emergence of networks in human genome epidemiology: challenges and opportunities. Epidemiology. 2007;18(1):1–8. - PubMed
1. Weedon MN, Lango H, Lindgren CM, et al. Genome-wide association analysis identifies 20 loci that influence adult height. Nat Genet. 2008;40(5):575–583. - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

UL1 RR025752/RR/NCRR NIH HHS/United States

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Discovery properties of genome-wide association signals from cumulatively combined data sets

Affiliation

Discovery properties of genome-wide association signals from cumulatively combined data sets

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources