Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008;53(10):886-893.
doi: 10.1007/s10038-008-0322-y. Epub 2008 Aug 12.

Appropriate data cleaning methods for genome-wide association study

Affiliations

Appropriate data cleaning methods for genome-wide association study

Taku Miyagawa et al. J Hum Genet. 2008.

Abstract

Genome-wide association studies (GWAS) using a large number of single nucleotide polymorphisms (SNPs) have successfully been applied to identify genetic variants of common diseases. However, genotyping using the new array technologies is often associated with spurious results that could unfavorably affect analyses of GWAS. Consequently, data cleaning is of paramount importance in excluding spurious genotyping results. In this study, we investigated the criteria required for the appropriate cleaning of 389 unrelated healthy Japanese samples analyzed using the GeneChip Human Mapping 500K Array Set for GWAS. The samples were randomly subdivided into two groups, and the allele frequencies in the groups were compared for individual SNPs as a quasi-case-control study. Then, observed results were filtered by four parameters (SNP call rate, confidence score obtained using the Bayesian Robust Linear Model with Mahalanobis genotype-calling algorithm, Hardy-Weinberg equilibrium, and minor allele frequency) and assessed for deviation from the null hypothesis. We found that appropriate data cleaning could be achieved using these four parameters. Our findings offer an avenue for obtaining appropriate data from GWAS.

PubMed Disclaimer

Similar articles

Cited by

References

    1. Nat Rev Genet. 2004 Feb;5(2):89-100 - PubMed
    1. Ann Hum Genet. 2001 Mar;65(Pt 2):197-206 - PubMed
    1. Science. 2005 Apr 15;308(5720):385-9 - PubMed
    1. Ann Hum Genet. 2002 Jul;66(Pt 4):297-306 - PubMed
    1. Nature. 2000 Jun 15;405(6788):847-56 - PubMed

Publication types

LinkOut - more resources