Estimation and testing of genotype and haplotype effects in case-control studies: comparison of weighted regression and multiple imputation procedures
- PMID: 16496312
- DOI: 10.1002/gepi.20142
Estimation and testing of genotype and haplotype effects in case-control studies: comparison of weighted regression and multiple imputation procedures
Abstract
A popular approach for testing and estimating genotype and haplotype effects associated with a disease outcome is to conduct a population-based case/control study, in which haplotypes are not directly observed but may be inferred probabilistically from unphased genotype data. A variety of methods exist to analyse the resulting data while accounting for the uncertainty in haplotype assignment, but most focus on the issue of testing the global null hypothesis that no genotype or haplotype effects exist. A more interesting question, once a region of disease association has been identified, is to estimate the relevant genotypic or haplotypic effects and to perform tests of complex null hypotheses such as the hypothesis that some loci, but not others, are associated with disease. Here I examine the assumptions behind, and the performance of, two classes of methods for addressing this question. The first is a weighted regression approach in which posterior probabilities of haplotype assignments are used as weights in a logistic regression analysis, generating a test based on either a weighted pseudo-likelihood, or a weighted log-likelihood. The second is a multiple imputation approach using either an improper procedure in which the posterior probabilities are used to generate replicate imputed data sets, or a proper data augmentation procedure. I compare these approaches to a simple expectation substitution (haplotype trend regression) approach. In simulations, all methods gave unbiased parameter estimation but the weighted pseudo-likelihood, expectation substitution and multiple imputation methods had superior confidence interval coverage. For the weighted pseudo-likelihood and expectation substitution methods it was necessary to estimate posterior haplotype assignment probabilities using the combined case/control data, whereas for the multiple imputation approaches it was necessary to estimate these probabilities in the case and control groups separately. Overall, multiple imputation was easiest approach to implement in standard statistical software and to extend to more complex models such as those that include gene-gene or gene-environment interactions.
Similar articles
-
Dealing with missing data in family-based association studies: a multiple imputation approach.Hum Hered. 2007;63(3-4):229-38. doi: 10.1159/000100481. Epub 2007 Mar 7. Hum Hered. 2007. PMID: 17347570 Free PMC article.
-
Accounting for haplotype uncertainty in matched association studies: a comparison of simple and flexible techniques.Genet Epidemiol. 2005 Apr;28(3):261-72. doi: 10.1002/gepi.20061. Genet Epidemiol. 2005. PMID: 15637718
-
Modeling and E-M estimation of haplotype-specific relative risks from genotype data for a case-control study of unrelated individuals.Hum Hered. 2003;55(4):179-90. doi: 10.1159/000073202. Hum Hered. 2003. PMID: 14566096
-
Multi-SNP Haplotype Analysis Methods for Association Analysis.Methods Mol Biol. 2017;1666:485-504. doi: 10.1007/978-1-4939-7274-6_24. Methods Mol Biol. 2017. PMID: 28980261
-
Estimating haplotype effects on dichotomous outcome for unphased genotype data using a weighted penalized log-likelihood approach.Hum Hered. 2006;61(2):104-10. doi: 10.1159/000093476. Epub 2006 May 24. Hum Hered. 2006. PMID: 16717475
Cited by
-
The use of inferred haplotypes in downstream analyses.Am J Hum Genet. 2007 Mar;80(3):577-9. doi: 10.1086/512201. Am J Hum Genet. 2007. PMID: 17380613 Free PMC article. No abstract available.
-
Using an uncertainty-coding matrix in Bayesian regression models for haplotype-specific risk detection in family association studies.PLoS One. 2011;6(7):e21890. doi: 10.1371/journal.pone.0021890. Epub 2011 Jul 15. PLoS One. 2011. PMID: 21789192 Free PMC article.
-
Quantitative trait association in parent offspring trios: Extension of case/pseudocontrol method and comparison of prospective and retrospective approaches.Genet Epidemiol. 2007 Dec;31(8):813-33. doi: 10.1002/gepi.20243. Genet Epidemiol. 2007. PMID: 17549757 Free PMC article.
-
Estimation and testing of gene-environment interactions in family-based association studies.Genomics. 2009 Jan;93(1):5-9. doi: 10.1016/j.ygeno.2008.05.002. Epub 2008 Jun 6. Genomics. 2009. PMID: 18538979 Free PMC article.
-
Dealing with missing data in family-based association studies: a multiple imputation approach.Hum Hered. 2007;63(3-4):229-38. doi: 10.1159/000100481. Epub 2007 Mar 7. Hum Hered. 2007. PMID: 17347570 Free PMC article.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources