Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Oct 5;101(4):539-551.
doi: 10.1016/j.ajhg.2017.08.012. Epub 2017 Sep 21.

Prospects of Fine-Mapping Trait-Associated Genomic Regions by Using Summary Statistics from Genome-wide Association Studies

Affiliations

Prospects of Fine-Mapping Trait-Associated Genomic Regions by Using Summary Statistics from Genome-wide Association Studies

Christian Benner et al. Am J Hum Genet. .

Abstract

During the past few years, various novel statistical methods have been developed for fine-mapping with the use of summary statistics from genome-wide association studies (GWASs). Although these approaches require information about the linkage disequilibrium (LD) between variants, there has not been a comprehensive evaluation of how estimation of the LD structure from reference genotype panels performs in comparison with that from the original individual-level GWAS data. Using population genotype data from Finland and the UK Biobank, we show here that a reference panel of 1,000 individuals from the target population is adequate for a GWAS cohort of up to 10,000 individuals, whereas smaller panels, such as those from the 1000 Genomes Project, should be avoided. We also show, both theoretically and empirically, that the size of the reference panel needs to scale with the GWAS sample size; this has important consequences for the application of these methods in ongoing GWAS meta-analyses and large biobank studies. We conclude by providing software tools and by recommending practices for sharing LD information to more efficiently exploit summary statistics in genetics research.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematic of Fine-Mapping Causal Variants in Trait-Associated Genomic Regions by Using GWAS Summary Statistics and LD Information Ideally, LD information is computed from the original GWAS data. LD information can, however, be obtained from a reference genotype panel when the original GWAS data are not available. An important open question is how large a reference genotype panel should be to nearly achieve the optimal fine-mapping performance given by the original GWAS data.
Figure 2
Figure 2
Fine-Mapping the APOE Region Associated with LDL-C Results are shown for 3,078 variants with a MAF above 1% and covering 1 Mb of the genome. Variants identified by a standard conditional analysis are highlighted in yellow. All other variants are colored with respect to their LD (absolute value of Pearson correlation) with the lead variant rs7412. (A) Negative log10 p values for each variant from a LDL-C GWAS on 15,626 individuals from the FINRISK study. (B) Bayes factor (log10) for assessing the causality of each variant by a FINEMAP analysis using the summary statistics from the LDL-C GWAS and the LD information from the original genotype data. (C) Bayes factor (log10) for assessing the causality of each variant by a FINEMAP analysis using the summary statistics from the LDL-C GWAS and the LD information from the reference genotypes of 99 Finns in the 1000GP.
Figure 3
Figure 3
Fine-Mapping the APOE Region Associated with LDL-C by Using Shrinkage Estimation of Correlations from the Finnish 1000GP Panel with 99 Individuals Bayes factors (log10) are shown from a FINEMAP analysis of 3,078 variants with a MAF above 1% and covering 1 Mb of the genome. GWAS summary statistics were computed with 15,626 individuals from the FINRISK study. Variants identified by a standard conditional analysis are highlighted in yellow. All other variants are colored with respect to their LD (absolute value of Pearson correlation) with the lead variant rs7412. (A) The same constant shrinkage factor of 0.80 was used for all correlations. (B) The same constant shrinkage factor of 0.25 was used for all correlations. (C) Recombination distance was used to define the shrinkage factor for each pair of variants.
Figure 4
Figure 4
Fine-Mapping Accuracy on Simulated Data In simulations with Finnish data, genotype data over 100 GWAS regions on 5,363 individuals from NFBC1966 were used for phenotype generation. In UKBB simulations, genotype data on 82,199 individuals covering the ABO region were used for phenotype generation. Each dataset included five causal SNPs with effect sizes that resulted in statistical power of 0.5 with 5,363 individuals at a significance level of 5 × 10−8. Results with different LD information are shown in plots of the number of selected causal SNPs (true positives) against the number of selected non-causal SNPs (false positives); the list of SNPs was ranked by their posterior probability of being causal. Reference genotype panels (solid line) are compared with the original genotype data (dashed line) with respect to the achieved partial area under the curve (pAUC). pAUCs and curves are averaged over the simulated datasets. (A) Accuracy with NFBC1966 summary statistics from a GWAS on 5,363 individuals and LD information either from the original genotype data or from a subset of the reference genotype data on FINRISK individuals. (B) Accuracy with UKBB summary statistics from a GWAS on 5,363 individuals and LD information either from the original GWAS data or from a subset of UKBB individuals not included in the GWAS. (C) Accuracy with UKBB summary statistics from a GWAS on 50,000 individuals and LD information either from the original GWAS data or from a subset of UKBB individuals not included in the GWAS.
Figure 5
Figure 5
Effect of Reference-Panel Size and GWAS Sample Size on Fine-Mapping Performance Results are shown for a pair of variants (MAF of 2%) of which one is causal and the other is non-causal and whose correlation is 0.37. The effect size of the causal variant is such that the statistical power with 15,626 individuals is approximately 0.5 at a significance level of 5 × 10−8. The probability of the true causal configuration is plotted on the y axis. The x axis shows the estimated correlation of the variants from a reference genotype panel. The central 95% probability interval (dashed line) of the sampling distribution is shown for different reference genotype panels. (A) GWAS summary statistics were computed with 15,626 individuals. (B) GWAS summary statistics were computed with 50,000 individuals.

References

    1. Finucane H.K., Bulik-Sullivan B., Gusev A., Trynka G., Reshef Y., Loh P.R., Anttila V., Xu H., Zang C., Farh K., ReproGen Consortium. Schizophrenia Working Group of the Psychiatric Genomics Consortium. RACI Consortium Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 2015;47:1228–1235. - PMC - PubMed
    1. Shi H., Kichaev G., Pasaniuc B. Contrasting the Genetic Architecture of 30 Complex Traits from Summary Association Data. Am. J. Hum. Genet. 2016;99:139–153. - PMC - PubMed
    1. Bulik-Sullivan B., Finucane H.K., Anttila V., Gusev A., Day F.R., Loh P.R., Duncan L., Perry J.R., Patterson N., Robinson E.B., ReproGen Consortium. Psychiatric Genomics Consortium. Genetic Consortium for Anorexia Nervosa of the Wellcome Trust Case Control Consortium 3 An atlas of genetic correlations across human diseases and traits. Nat. Genet. 2015;47:1236–1241. - PMC - PubMed
    1. Brown B.C., Ye C.J., Price A.L., Zaitlen N., Asian Genetic Epidemiology Network Type 2 Diabetes Consortium Transethnic Genetic-Correlation Estimates from Summary Statistics. Am. J. Hum. Genet. 2016;99:76–88. - PMC - PubMed
    1. Lee D., Williamson V.S., Bigdeli T.B., Riley B.P., Fanous A.H., Vladimirov V.I., Bacanu S.A. JEPEG: a summary statistics based tool for gene-level joint testing of functional variants. Bioinformatics. 2015;31:1176–1182. - PMC - PubMed