Data quality control in genetic case-control association studies

Carl A Anderson¹, Fredrik H Pettersson, Geraldine M Clarke, Lon R Cardon, Andrew P Morris, Krina T Zondervan

Affiliations

PMID: 21085122
PMCID: PMC3025522
DOI: 10.1038/nprot.2010.116

Data quality control in genetic case-control association studies

Carl A Anderson et al. Nat Protoc. 2010 Sep.

. 2010 Sep;5(9):1564-73.

doi: 10.1038/nprot.2010.116. Epub 2010 Aug 26.

Authors

Carl A Anderson¹, Fredrik H Pettersson, Geraldine M Clarke, Lon R Cardon, Andrew P Morris, Krina T Zondervan

Affiliation

¹ Genetic and Genomic Epidemiology Unit, Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK. carl.anderson@sanger.ac.uk

PMID: 21085122
PMCID: PMC3025522
DOI: 10.1038/nprot.2010.116

Abstract

This protocol details the steps for data quality assessment and control that are typically carried out during case-control association studies. The steps described involve the identification and removal of DNA samples and markers that introduce bias. These critical steps are paramount to the success of a case-control study and are necessary before statistically testing for association. We describe how to use PLINK, a tool for handling SNP data, to perform assessments of failure rate per individual and per SNP and to assess the degree of relatedness between individuals. We also detail other quality-control procedures, including the use of SMARTPCA software for the identification of ancestral outliers. These platforms were selected because they are user-friendly, widely used and computationally efficient. Steps needed to detect and establish a disease association using case-control data are not discussed here. Issues concerning study design and marker selection in case-control studies have been discussed in our earlier protocols. This protocol, which is routinely used in our labs, should take approximately 8 h to complete.

PubMed Disclaimer

Figures

**Figure 1**
Genotype failure rate vs. heterozygosity across all individuals the study. Shading indicates sample density and dashed lines denote QC thresholds.

**Figure 2**
Ancestry clustering based on genome-wide association data. HapMap3 reference samples: CEU (red), CHB+JPT (purple) and YRI (green). GWA samples: black crosses. 11 cases and 19 controls with a 2^nd principal component score less than 0.072 (grey dashed line) were marked for removal.

**Figure 3**
Histogram of missing data rate across all individuals passing ‘per-individual’ QC. The dashed vertical line represents the threshold (3%) at which SNPs were removed from further analysis due to an excess failure rate.

See this image and copyright information in PMC

References

1. Zondervan KT, Cardon LR. Designing candidate gene and genome-wide case-control association studies. Nat Protoc. 2007;2:2492. - PMC - PubMed
1. Teo YY, et al. A genotype calling algorithm for the Illumina BeadArray platform. Bioinformatics. 2007;23:2741. - PMC - PubMed
1. The Wellcome Trust Case Control Consortium Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447:661. - PMC - PubMed
1. Clayton DG, et al. Population structure, differential bias and genomic control in a large-scale, case-control association study. Nat Genet. 2005;37:1243. - PubMed
1. Marchini J, Howie B, Myers SR, McVean G, Donnelly P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet. 2007;39:906. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Data quality control in genetic case-control association studies

Affiliation

Data quality control in genetic case-control association studies

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases