Modeling Informatively Missing Genotypes in Haplotype Analysis
- PMID: 20052310
- PMCID: PMC2801447
- DOI: 10.1080/03610920802696588
Modeling Informatively Missing Genotypes in Haplotype Analysis
Abstract
It is common to have missing genotypes in practical genetic studies. The majority of the existing statistical methods, including those on haplotype analysis, assume that genotypes are missing at random-that is, at a given marker, different genotypes and different alleles are missing with the same probability. In our previous work, we have demonstrated that the violation of this assumption may lead to serious bias in haplotype frequency estimates and haplotype association analysis. We have proposed a general missing data model to simultaneously characterize missing data patterns across a set of two or more biallelic markers. We have proved that haplotype frequencies and missing data probabilities are identifiable if and only if there is linkage disequilibrium between these markers under the general missing data model. In this study, we extend our work to multi-allelic markers and observe a similar finding. Simulation studies on the analysis of haplotypes consisting of two markers illustrate that our proposed model can reduce the bias for haplotype frequency estimates due to incorrect assumptions on the missing data mechanism. Finally, we illustrate the utilities of our method through its application to a real data set from a study of scleroderma.
Similar articles
-
Haplotype analysis in the presence of informatively missing genotype data.Genet Epidemiol. 2006 May;30(4):290-300. doi: 10.1002/gepi.20144. Genet Epidemiol. 2006. PMID: 16528706
-
Quantifying the amount of missing information in genetic association studies.Genet Epidemiol. 2006 Dec;30(8):703-17. doi: 10.1002/gepi.20181. Genet Epidemiol. 2006. PMID: 16986163
-
The impact of missing and erroneous genotypes on tagging SNP selection and power of subsequent association tests.Hum Hered. 2006;61(1):31-44. doi: 10.1159/000092141. Epub 2006 Mar 23. Hum Hered. 2006. PMID: 16557026
-
Haplotype frequency estimation error analysis in the presence of missing genotype data.BMC Bioinformatics. 2004 Dec 1;5:188. doi: 10.1186/1471-2105-5-188. BMC Bioinformatics. 2004. PMID: 15574202 Free PMC article.
-
Accuracy of haplotype frequency estimation for biallelic loci, via the expectation-maximization algorithm for unphased diploid genotype data.Am J Hum Genet. 2000 Oct;67(4):947-59. doi: 10.1086/303069. Epub 2000 Aug 22. Am J Hum Genet. 2000. PMID: 10954684 Free PMC article.
Cited by
-
Reducing bias of allele frequency estimates by modeling SNP genotype data with informative missingness.Front Genet. 2012 Jun 18;3:107. doi: 10.3389/fgene.2012.00107. eCollection 2012. Front Genet. 2012. PMID: 22719749 Free PMC article.
-
A powerful test of parent-of-origin effects for quantitative traits using haplotypes.PLoS One. 2011;6(12):e28909. doi: 10.1371/journal.pone.0028909. Epub 2011 Dec 13. PLoS One. 2011. PMID: 22174922 Free PMC article.
References
-
- Akey J, Jin L, Xiong M. Haplotypes vs single marker linkage disequilibrium tests: What do we gain? Eur J Hum Genet. 2001;9(4):291–300. - PubMed
-
- Arnett FC, Cho M, Chatterjee S, Aguilar MB, Reveille JD, Mayes MD. Familial occurrence frequencies and relative risks for systemic sclerosis (scleroderma) in three United States cohorts. Arthritis Rheum. 2001;44(6):1359–1362. - PubMed
-
- Assassi S, Tan FK. Genetics of scleroderma: Update on single nucleotide polymorphism analysis and microarrays. Curr Opin Rheumatol. 2005;17(6):761–767. - PubMed
-
- Baugh JA, Chitnis S, Donnelly SC, Monteiro J, Lin X, Plant BJ, Wolfe F, Gregersen PK, Bucala R. A functional promoter polymorphism in the macrophage migration inhibitory factor (MIF) gene associated with disease severity in rheumatoid arthritis. Genes Immun. 2002;3(3):170–176. - PubMed
Grants and funding
LinkOut - more resources
Full Text Sources