Extremely low-coverage sequencing and imputation increases power for genome-wide association studies

Affiliations

PMID: 22610117
PMCID: PMC3400344
DOI: 10.1038/ng.2283

Extremely low-coverage sequencing and imputation increases power for genome-wide association studies

Bogdan Pasaniuc et al. Nat Genet. 2012.

. 2012 May 20;44(6):631-5.

doi: 10.1038/ng.2283.

Affiliation

¹ Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts, USA. bpasaniu@hsph.harvard.edu

PMID: 22610117
PMCID: PMC3400344
DOI: 10.1038/ng.2283

Abstract

Genome-wide association studies (GWAS) have proven to be a powerful method to identify common genetic variants contributing to susceptibility to common diseases. Here, we show that extremely low-coverage sequencing (0.1-0.5×) captures almost as much of the common (>5%) and low-frequency (1-5%) variation across the genome as SNP arrays. As an empirical demonstration, we show that genome-wide SNP genotypes can be inferred at a mean r(2) of 0.71 using off-target data (0.24× average coverage) in a whole-exome study of 909 samples. Using both simulated and real exome-sequencing data sets, we show that association statistics obtained using extremely low-coverage sequencing data attain similar P values at known associated variants as data from genotyping arrays, without an excess of false positives. Within the context of reductions in sample preparation and sequencing costs, funds invested in extremely low-coverage sequencing can yield several times the effective sample size of GWAS based on SNP array data and a commensurate increase in statistical power.

PubMed Disclaimer

Figures

**Figure 1**
Genotype imputation accuracy as function of coverage in 1000 Genomes Project simulations. Accuracy as function of coverage is displayed using solid lines for common SNPs (MAF >5%) and dashed lines for low-frequency SNPs (MAF<5%).

**Figure 2**
Observed versus expected association minus log 10 p-values at 103,977 SNPs across the genome in simulated null data sets over 909 samples of the combined data set. We observe r² of 0.64 between p-values computed in typed versus imputed data, similar to simulations of association statistics at imputed versus genotyping calls (Supplementary Note). Results for alternate hypothesis of association can be found in Supplementary Note.

**Figure 3**
Genotype imputation accuracy in IHCS whole-exome data as a function of coverage. Illumina 1M genotype calls were used as a gold standard, restricting to 6070 SNPs in 10 distinct 5Mb regions (total of 50Mb) of the genome (see main text). Dotted lines denote results attained in 1000 Genomes simulations on the same SNP set.

**Figure 4**
Coverage (and corresponding number of samples) for fixed budget of $300,000. (a) Effective sample size in sequencing-based GWAS as function of number of samples and resulting coverage. Cost assumptions: $30 per sample preparation cost, $133 per 1x sequencing cost (see main text). (b) Ratio of expected association statistic (effective sample size) in sequencing-based GWAS vs. array-based GWAS at $400/sample, as a function of sample preparation and sequencing costs. Expected association statistics for sequencing-based GWAS are based on optimum coverage and number of samples (assuming arbitrarily large number of samples available) subject to budget constraint. The optimum coverage and number of samples varies at different points on the graph (not shown). Black dot denotes $30 sample preparation cost and $133 per 1x.

See this image and copyright information in PMC

References

1. Hindorff LA, et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci U S A. 2009;106:9362–9367. doi: 10.1073/pnas.0903103106. - DOI - PMC - PubMed
1. Durbin RM, et al. A map of human genome variation from population-scale sequencing. Nature. 2010;467:1061–1073. doi: 10.1038/nature09534. - DOI - PMC - PubMed
1. Depristo MA, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011;43:491–498. doi: 10.1038/ng.806. - DOI - PMC - PubMed
1. Marchini J, Howie B. Genotype imputation for genome-wide association studies. Nat Rev Genet. 2010;11:499–511. doi: 10.1038/nrg2796. - DOI - PubMed
1. Altshuler DM, et al. Integrating common and rare genetic variation in diverse human populations. Nature. 2010;467:52–58. doi: 10.1038/nature09298. - DOI - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- H1 Connect - Access expert opinions and insights on biomedical research.
- The Lens - Patent Citations Database
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Extremely low-coverage sequencing and imputation increases power for genome-wide association studies

Affiliation

Extremely low-coverage sequencing and imputation increases power for genome-wide association studies

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases

Research Materials