Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 May 26:8:15606.
doi: 10.1038/ncomms15606.

Whole genome sequencing and imputation in isolated populations identify genetic associations with medically-relevant complex traits

Affiliations

Whole genome sequencing and imputation in isolated populations identify genetic associations with medically-relevant complex traits

Lorraine Southam et al. Nat Commun. .

Abstract

Next-generation association studies can be empowered by sequence-based imputation and by studying founder populations. Here we report ∼9.5 million variants from whole-genome sequencing (WGS) of a Cretan-isolated population, and show enrichment of rare and low-frequency variants with predicted functional consequences. We use a WGS-based imputation approach utilizing 10,422 reference haplotypes to perform genome-wide association analyses and observe 17 genome-wide significant, independent signals, including replicating evidence for association at eight novel low-frequency variant signals. Two novel cardiometabolic associations are at lead variants unique to the founder population sequences: chr16:70790626 (high-density lipoprotein levels beta -1.71 (SE 0.25), P=1.57 × 10-11, effect allele frequency (EAF) 0.006); and rs145556679 (triglycerides levels beta -1.13 (SE 0.17), P=2.53 × 10-11, EAF 0.013). Our findings add empirical support to the contribution of low-frequency variants in complex traits, demonstrate the advantage of including population-specific sequences in imputation panels and exemplify the power gains afforded by population isolates.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interests.

Figures

Figure 1
Figure 1. Flowchart of study design.
The HELIC cohorts were prephased, imputed and analysed separately by cohort and array, and finally meta-analysed. The variant numbers reported here are total regardless of MAF. Imputed variants are for chromosomes 1–22.
Figure 2
Figure 2. Variant sharing and functional annotation.
(a) SNP density per kbp and percentage of total per functional class, based on 9,554,503 variants identified in the HELIC MANOLIS 4 × WGS data of 249 samples (MAC≥2). Error bars indicate standard error of the mean; the dashed red line indicates average density genome-wide. (b) Variant overlap between 498 HELIC MANOLIS, 7,582 UK10K and 2,184 1000 Genomes Project reference panel haplotypes, by MAF category. Numerical values are given in Supplementary Tables 1 and 2.
Figure 3
Figure 3. Functional enrichment of variants private to the MANOLIS sequences when compared to variants shared with UK10K and/or 1000 Genomes.
Enrichment and depletion of functional classes of variants private to the MANOLIS cohort can be observed in the rare and low-frequency (MAF≤5%), while no significant enrichment is detected in common-frequency variants in any functional class. Numerical values are listed in Supplementary Table 4.
Figure 4
Figure 4. False-positive rate and meta-analysis power in the presence of sample overlap using METACARPA.
(a) Empirical false-positive rate as a function of sample overlap in 1,000 repeats of a meta-analysis of two studies including 2,000 samples each, at a significance threshold of 5 × 10−8. (b) Empirical power of the four tests implemented in METACARPA as a function of sample overlap in the same simulation setting. Power is calculated as the discovery rate of a SNP explaining 1% of a standard normal phenotype under the same simulation scenario (for example, a MAF of 1% and an effect size of 0.705, or a MAF of 20% and an effect size of 0.176). (c) Compared accuracy of Digby's estimate of tetrachoric correlation and Pearson's correlation for a true (dashed line) 25% overlap under a polygenic burden, with 10,000 SNPs affecting a quantitative trait with 20% heritability. Estimates of correlation for both methods are calculated over 300 genome-wide simulations. The black line indicates the median, shaded rectangles represent the interquintile ranges.
Figure 5
Figure 5. Association results for chr16:70790626 and rs145556679 and lipid levels.
(a) Heterozygotes for chr16:70790626 exhibit significantly lower HDL levels than homozygotes (Wald test METACARPA P=1.57 × 10−11). (b) Heterozygotes for rs145556679 exhibit significantly lower TG (Wald test METACARPA P=2.53 × 10−11) and VLDL (Wald test METACARPA P=2.90 × 10−11) levels than homozygotes. (c) Regional association plot for chr16:70790626. (d) To determine if the signals are detected without MANOLIS sequences in the reference panel, we conducted imputation using a combined UK10K+1000 Genomes reference panel; the regional plot shows that the chr16:70790626 signal is captured with a different lead variant and a decrease in significance. (e) Regional association plot for rs145556679. (f) Regional association plot for rs145556679 using a combined UK10K+1000 Genomes reference panel; the same signal is captured with a different lead variant and a decrease in association strength. LocusZoom was used to create the regional plots (http://csg.sph.umich.edu/locuszoom/).

References

    1. Walter K. et al. The UK10K project identifies rare variants in health and disease. Nature 526, 82–90 (2015). - PMC - PubMed
    1. Huang J. et al. Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel. Nat. Commun. 6, 8111 (2015). - PMC - PubMed
    1. Abecasis G. R. et al. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010). - PMC - PubMed
    1. Gudbjartsson D. F. et al. Large-scale whole-genome sequencing of the Icelandic population. Nat. Genet. 47, 435–444 (2015). - PubMed
    1. Sidore C. et al. Genome sequencing elucidates Sardinian genetic architecture and augments association analyses for lipid and blood inflammatory markers. Nat. Genet. 47, 1272–1281 (2015). - PMC - PubMed

Publication types

LinkOut - more resources