Simultaneous genotype calling and haplotype phasing improves genotype accuracy and reduces false-positive associations for genome-wide association studies
- PMID: 19931040
- PMCID: PMC2790566
- DOI: 10.1016/j.ajhg.2009.11.004
Simultaneous genotype calling and haplotype phasing improves genotype accuracy and reduces false-positive associations for genome-wide association studies
Abstract
We present a novel method for simultaneous genotype calling and haplotype-phase inference. Our method employs the computationally efficient BEAGLE haplotype-frequency model, which can be applied to large-scale studies with millions of markers and thousands of samples. We compare genotype calls made with our method to genotype calls made with the BIRDSEED, CHIAMO, GenCall, and ILLUMINUS genotype-calling methods, using genotype data from the Illumina 550K and Affymetrix 500K arrays. We show that our method has higher genotype-call accuracy and yields fewer uncalled genotypes than competing methods. We perform single-marker analysis of data from the Wellcome Trust Case Control Consortium bipolar disorder and type 2 diabetes studies. For bipolar disorder, the genotype calls in the original study yield 25 markers with apparent false-positive association with bipolar disorder at a p < 10(-7) significance level, whereas genotype calls made with our method yield no associated markers at this significance threshold. Conversely, for markers with replicated association with type 2 diabetes, there is good concordance between genotype calls used in the original study and calls made by our method. Results from single-marker and haplotypic analysis of our method's genotype calls for the bipolar disorder study indicate that our method is highly effective at eliminating genotyping artifacts that cause false-positive associations in genome-wide association studies. Our new genotype-calling methods are implemented in the BEAGLE and BEAGLECALL software packages.
Figures






Similar articles
-
Assessing batch effects of genotype calling algorithm BRLMM for the Affymetrix GeneChip Human Mapping 500 K array set using 270 HapMap samples.BMC Bioinformatics. 2008 Aug 12;9 Suppl 9(Suppl 9):S17. doi: 10.1186/1471-2105-9-S9-S17. BMC Bioinformatics. 2008. PMID: 18793462 Free PMC article.
-
Variability in GWAS analysis: the impact of genotype calling algorithm inconsistencies.Pharmacogenomics J. 2010 Aug;10(4):324-35. doi: 10.1038/tpj.2010.46. Pharmacogenomics J. 2010. PMID: 20676070
-
Fast two-stage phasing of large-scale sequence data.Am J Hum Genet. 2021 Oct 7;108(10):1880-1890. doi: 10.1016/j.ajhg.2021.08.005. Epub 2021 Sep 2. Am J Hum Genet. 2021. PMID: 34478634 Free PMC article.
-
Missing data imputation and haplotype phase inference for genome-wide association studies.Hum Genet. 2008 Dec;124(5):439-50. doi: 10.1007/s00439-008-0568-7. Epub 2008 Oct 11. Hum Genet. 2008. PMID: 18850115 Free PMC article. Review.
-
Genotype Imputation in Genome-Wide Association Studies.Curr Protoc Hum Genet. 2019 Jun;102(1):e84. doi: 10.1002/cphg.84. Curr Protoc Hum Genet. 2019. PMID: 31216114 Review.
Cited by
-
A new model calling procedure for Illumina BeadArray data.BMC Genet. 2016 Jun 24;17(1):90. doi: 10.1186/s12863-016-0398-x. BMC Genet. 2016. PMID: 27343118 Free PMC article.
-
Extremely low-coverage sequencing and imputation increases power for genome-wide association studies.Nat Genet. 2012 May 20;44(6):631-5. doi: 10.1038/ng.2283. Nat Genet. 2012. PMID: 22610117 Free PMC article.
-
DISSCO: direct imputation of summary statistics allowing covariates.Bioinformatics. 2015 Aug 1;31(15):2434-42. doi: 10.1093/bioinformatics/btv168. Epub 2015 Mar 24. Bioinformatics. 2015. PMID: 25810429 Free PMC article.
-
DASH: a method for identical-by-descent haplotype mapping uncovers association with recent variation.Am J Hum Genet. 2011 Jun 10;88(6):706-717. doi: 10.1016/j.ajhg.2011.04.023. Epub 2011 May 27. Am J Hum Genet. 2011. PMID: 21620352 Free PMC article.
-
Genomic prediction of the polled and horned phenotypes in Merino sheep.Genet Sel Evol. 2018 May 22;50(1):28. doi: 10.1186/s12711-018-0398-6. Genet Sel Evol. 2018. PMID: 29788905 Free PMC article.
References
-
- Frayling T.M. Genome-wide association studies provide new insights into type 2 diabetes aetiology. Nat. Rev. Genet. 2007;8:657–662. - PubMed
-
- Zeggini E., Scott L.J., Saxena R., Voight B.F., Marchini J.L., Hu T., de Bakker P.I., Abecasis G.R., Almgren P., Andersen G. Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nat. Genet. 2008;40:638–645. - PMC - PubMed
-
- Rioux J.D., Xavier R.J., Taylor K.D., Silverberg M.S., Goyette P., Huett A., Green T., Kuballa P., Barmada M.M., Datta L.W. Genome-wide association study identifies new susceptibility loci for Crohn disease and implicates autophagy in disease pathogenesis. Nat. Genet. 2007;39:596–604. - PMC - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous