Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Apr;35(3):159-73.
doi: 10.1002/gepi.20564. Epub 2011 Jan 31.

Phenotype harmonization and cross-study collaboration in GWAS consortia: the GENEVA experience

Affiliations

Phenotype harmonization and cross-study collaboration in GWAS consortia: the GENEVA experience

Siiri N Bennett et al. Genet Epidemiol. 2011 Apr.

Abstract

Genome-wide association study (GWAS) consortia and collaborations formed to detect genetic loci for common phenotypes or investigate gene-environment (G*E) interactions are increasingly common. While these consortia effectively increase sample size, phenotype heterogeneity across studies represents a major obstacle that limits successful identification of these associations. Investigators are faced with the challenge of how to harmonize previously collected phenotype data obtained using different data collection instruments which cover topics in varying degrees of detail and over diverse time frames. This process has not been described in detail. We describe here some of the strategies and pitfalls associated with combining phenotype data from varying studies. Using the Gene Environment Association Studies (GENEVA) multi-site GWAS consortium as an example, this paper provides an illustration to guide GWAS consortia through the process of phenotype harmonization and describes key issues that arise when sharing data across disparate studies. GENEVA is unusual in the diversity of disease endpoints and so the issues it faces as its participating studies share data will be informative for many collaborations. Phenotype harmonization requires identifying common phenotypes, determining the feasibility of cross-study analysis for each, preparing common definitions, and applying appropriate algorithms. Other issues to be considered include genotyping timeframes, coordination of parallel efforts by other collaborative groups, analytic approaches, and imputation of genotype data. GENEVA's harmonization efforts and policy of promoting data sharing and collaboration, not only within GENEVA but also with outside collaborations, can provide important guidance to ongoing and new consortia.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Phenotype harmonization roles and responsibilities in GENEVA
Figure 2
Figure 2
Power of a case-control study to detect a gene-environment interaction (departure from a multiplicative odds model) when the binary exposure is measured perfectly or via a good proxy with 77% specificity and 99% sensitivity (roughly analogous to self-reported versus measured overweight status) This figure illustrates several points: a) large samples sizes are needed to detect gene-environment interactions; b) even modest misclassification can greatly decrease the power of tests for gene-environment interaction (and the relative decrease is greater for rare exposures); yet c) a large study using the proxy can have greater power than a smaller study using the perfect measure. This last point is important when the perfect measure is prohibitively expensive or only available on a small fraction of samples, while the good measure is relatively inexpensive or already available on many samples. Power calculations were performed using the methods described in Lindstrom et al. (2009), assuming a rare disease (prevalence 1 in 1,000), no main effect for the binary genetic factor (with 20% prevalence), an odds ratio of 1.5 for the exposure, an interaction odds ratio of 1.35, and a Type I error rate of 5×10-8.

References

    1. Browning BL, Browning SR. A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am J Hum Genet. 2009;84:210–223. - PMC - PubMed
    1. Cornelis MC, Agrawal A, Cole JW, Hansel NN, Barnes KC, Beaty TH, Bennett SN, Bierut LJ, Boerwinkle E, Doheny KF, Feenstra B, Feingold E, Fornage M, Haiman CA, Harris EL, Hayes MG, Heit JA, Hu FB, Kang JH, Laurie CC, Ling H, Manolio TA, Marazita ML, Mathias RA, Mirel DB, Paschall J, Pasquale LR, Pugh EW, Rice JP, Udren J, van Dam RM, Wang X, Wiggs JL, Williams K, Yu K, for the GENEVA Consortium The Gene, Environment Association Studies Consortium (GENEVA): Maximizing the knowledge obtained from GWAS by collaboration across studies of multiple conditions. Genet Epidemiol. 2010;34:364–372. - PMC - PubMed
    1. Cornelis MC, Qi L, Kraft P, Hu FB. TCF7L2, dietary carbohydrate, and risk of type 2 diabetes in US women. Am J Clin Nutr. 2009;89:1256–1262. - PMC - PubMed
    1. de Bakker PIW, Ferreira MAR, Jia X, Neale BM, Raychaudhuri S, Voight BF. Practical aspects of imputation-driven meta-analysis of genome-wide association studies. Hum Mol Genet. 2008;17(R2):R122–128. - PMC - PubMed
    1. Garcia-Closas M, Lubin JH. Power and sample size calculations in case-control studies of gene-environment interactions: Comments on different approaches. Am J Epidemiol. 1999;149:689–692. - PubMed

Publication types

Grants and funding