Appropriate data cleaning methods for genome-wide association study
- PMID: 18695938
- DOI: 10.1007/s10038-008-0322-y
Appropriate data cleaning methods for genome-wide association study
Abstract
Genome-wide association studies (GWAS) using a large number of single nucleotide polymorphisms (SNPs) have successfully been applied to identify genetic variants of common diseases. However, genotyping using the new array technologies is often associated with spurious results that could unfavorably affect analyses of GWAS. Consequently, data cleaning is of paramount importance in excluding spurious genotyping results. In this study, we investigated the criteria required for the appropriate cleaning of 389 unrelated healthy Japanese samples analyzed using the GeneChip Human Mapping 500K Array Set for GWAS. The samples were randomly subdivided into two groups, and the allele frequencies in the groups were compared for individual SNPs as a quasi-case-control study. Then, observed results were filtered by four parameters (SNP call rate, confidence score obtained using the Bayesian Robust Linear Model with Mahalanobis genotype-calling algorithm, Hardy-Weinberg equilibrium, and minor allele frequency) and assessed for deviation from the null hypothesis. We found that appropriate data cleaning could be achieved using these four parameters. Our findings offer an avenue for obtaining appropriate data from GWAS.
Similar articles
-
Assessing batch effects of genotype calling algorithm BRLMM for the Affymetrix GeneChip Human Mapping 500 K array set using 270 HapMap samples.BMC Bioinformatics. 2008 Aug 12;9 Suppl 9(Suppl 9):S17. doi: 10.1186/1471-2105-9-S9-S17. BMC Bioinformatics. 2008. PMID: 18793462 Free PMC article.
-
Evaluating the performance of Affymetrix SNP Array 6.0 platform with 400 Japanese individuals.BMC Genomics. 2008 Sep 22;9:431. doi: 10.1186/1471-2164-9-431. BMC Genomics. 2008. PMID: 18803882 Free PMC article.
-
Genome-wide association study of panic disorder in the Japanese population.J Hum Genet. 2009 Feb;54(2):122-6. doi: 10.1038/jhg.2008.17. Epub 2009 Jan 23. J Hum Genet. 2009. PMID: 19165232
-
Alternative Applications of Genotyping Array Data Using Multivariant Methods.Trends Genet. 2020 Nov;36(11):857-867. doi: 10.1016/j.tig.2020.07.006. Epub 2020 Aug 6. Trends Genet. 2020. PMID: 32773169 Free PMC article. Review.
-
A tutorial on statistical methods for population association studies.Nat Rev Genet. 2006 Oct;7(10):781-91. doi: 10.1038/nrg1916. Nat Rev Genet. 2006. PMID: 16983374 Review.
Cited by
-
Unraveling Genomic Regions Controlling Root Traits as a Function of Nitrogen Availability in the MAGIC Wheat Population WM-800.Plants (Basel). 2022 Dec 14;11(24):3520. doi: 10.3390/plants11243520. Plants (Basel). 2022. PMID: 36559632 Free PMC article.
-
Genome-wide association mapping for resistance to leaf, stem, and yellow rusts of common wheat under field conditions of South Kazakhstan.PeerJ. 2020 Aug 31;8:e9820. doi: 10.7717/peerj.9820. eCollection 2020. PeerJ. 2020. PMID: 32944423 Free PMC article.
-
Quality control procedures for genome-wide association studies.Curr Protoc Hum Genet. 2011 Jan;Chapter 1:Unit1.19. doi: 10.1002/0471142905.hg0119s68. Curr Protoc Hum Genet. 2011. PMID: 21234875 Free PMC article.
-
Genome-wide association study identifies TNFSF15 and POU2AF1 as susceptibility loci for primary biliary cirrhosis in the Japanese population.Am J Hum Genet. 2012 Oct 5;91(4):721-8. doi: 10.1016/j.ajhg.2012.08.010. Epub 2012 Sep 20. Am J Hum Genet. 2012. PMID: 23000144 Free PMC article.
-
Genome-Wide Association Mapping in the Global Diversity Set Reveals New QTL Controlling Root System and Related Shoot Variation in Barley.Front Plant Sci. 2016 Jul 19;7:1061. doi: 10.3389/fpls.2016.01061. eCollection 2016. Front Plant Sci. 2016. PMID: 27486472 Free PMC article.
References
Publication types
MeSH terms
LinkOut - more resources
Miscellaneous