Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Mar 9;14(1):1287.
doi: 10.1038/s41467-023-36864-8.

Imputation-powered whole-exome analysis identifies genes associated with kidney function and disease in the UK Biobank

Affiliations

Imputation-powered whole-exome analysis identifies genes associated with kidney function and disease in the UK Biobank

Matthias Wuttke et al. Nat Commun. .

Abstract

Genome-wide association studies have discovered hundreds of associations between common genotypes and kidney function but cannot comprehensively investigate rare coding variants. Here, we apply a genotype imputation approach to whole exome sequencing data from the UK Biobank to increase sample size from 166,891 to 408,511. We detect 158 rare variants and 105 genes significantly associated with one or more of five kidney function traits, including genes not previously linked to kidney disease in humans. The imputation-powered findings derive support from clinical record-based kidney disease information, such as for a previously unreported splice allele in PKD2, and from functional studies of a previously unreported frameshift allele in CLDN10. This cost-efficient approach boosts statistical power to detect and characterize both known and novel disease susceptibility variants and genes, can be generalized to larger future studies, and generates a comprehensive resource ( https://ckdgen-ukbb.gm.eurac.edu/ ) to direct experimental and clinical studies of kidney disease.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Imputation quality in 10,000 validation samples and 2,191,400 variants.
a Boxplots of the squared correlation (R2) of sequenced genotypes with imputed dosages. Variants were binned based on their MAC (2–10) or MAF (≥0.0001) in the 156 K reference panel, resulting in a mean of 157 K (27–630 K) variants per bin. The boxes represent the first to the third quartile, the horizontal line the median, and the whiskers extend to 1.5 times the interquartile range. b Mean genotype concordance (number of matching genotypes/total number of genotypes) of the hard call imputed genotypes with sequenced data for homozygous reference (HomRef), homozygous alternative (HomAlt), and heterozygous (Het) calls. Variants were binned based on their MAC in the 156 K reference panel, resulting in a mean of 1.5 × 109 (3 × 108–6 × 109) hard calls per bin. c Mean squared correlation of sequenced genotypes and imputed dosages using a reference panel of 156 K individuals (this study) and a reference panel of 50 K individuals (Barton et al.). Variants were binned based on the MAC or MAF in their respective reference panel.
Fig. 2
Fig. 2. Details for carriers of the PKD2 p.Phe472*fs variant.
a Kidney ICD10 diagnosis by the carrier, color-coded (red—cystic kidney disease codes, blue—CKD codes). Carrier age was annotated in the columns. The median age was 60 (range 41–67). b Boxplot of eGFRcrea for the non-carriers of the p.Phe472*fs (4:88046737:TC:T) frameshift variant (left) and of the carriers (right); eGFRcrea of carriers ranged from 25 to 108 ml/min/1.73 m2 (mean 65, SD 28). The boxes represent the first to the third quartile, the horizontal line the median, and the whiskers extend to 1.5 times the interquartile range.
Fig. 3
Fig. 3. Associated genes across phenotypes.
Circular heatmap for genes significantly associated with at least one phenotype of interest. Genes are depicted in the radials with one band per phenotype and divided by chromosome. Coloring according to effect size and direction. Significant gene-phenotype pairs (p < 6.8 × 10−7) are marked with a small black box. Effect size color is only shown for nominally significant (p < 0.05) gene-phenotype associations. Binary trait effect sizes are scaled by 10% (range: −2 to 2). Two-sided p-values were obtained from linear regression models of mask variant risk allele dosage on phenotypes.
Fig. 4
Fig. 4. Validation of imputation-based signals using data from whole exome sequencing of all corresponding UK Biobank participants.
Scatter plots of −log10-transformed p-values (a) and effect sizes (b) for single variant (ExWAS) and −log10-transformed p-values (c) and effect sizes (d) for the gene-based test (GBT) analyses comparing association statistics from partially imputed (x-axis) and fully sequenced (y-axis) data. Single variant results are color-coded by minor allele frequency (MAF), and GBT results are color-coded by p-value (blue—p-value below the significance threshold of 6.7 × 10−7, red—above threshold). r denotes the Pearson correlation coefficient. For panel a, two-sided p-values were obtained from linear mixed effect models (REGENIE) of effect allele dosage on phenotypes. For panel c, two-sided p-values were obtained from linear regression models of mask variant risk allele dosage on phenotypes.
Fig. 5
Fig. 5. Expression of associated genes from gene-level analyses in kidney cell types.
Heatmaps showing cell type-specific expression of associated genes in eGFRcrea, eGFRcys, urea, UACR, and urate. The expression z-score values are based on single-cell RNA-seq data from Stewart et al. Fifteen non-immune kidney cell types are grouped into nephron, endothelium, and stroma, with 9, 4, and 2 cell types each. Genes in each heatmap are ordered by the maximum expression z-score along the 15 cell types.
Fig. 6
Fig. 6. Phenome-wide association study for kidney genes.
Phenome-wide association study of genes identified in our single variant ExWAS or gene-based study with other phenotypes in the UK Biobank. For every gene, the odds ratio or beta of the UK Biobank phenotype with the smallest p-value of a gene-based association test is displayed. Genes with multiple significant phenotype associations are marked with an asterisk (*). Full results, including numbers of biologically independent samples for all shown UK phenotypes, are available in Supplementary Data 5. Only UK Biobank associations with p-value < 5 × 10−8 (as reported by the respective studies), and more than 5 cases (for binary traits) or more than 5 individuals (for quantitative traits) were considered. a Odd ratios are shown as center points with error bars representing 95% confidence intervals for binary traits. b Betas are shown as center points with error bars representing 95% confidence intervals for quantitative traits, resulting from a linear regression model correcting for age, sex, and age × sex.
Fig. 7
Fig. 7. Claudin-10b wt and fs: subcellular distribution and FRET efficiency.
Upper panels: co-transfection of YFP (red) and CFP(cyan)-tagged Cldn10b wt, fs, or fs(SA), respectively. The majority of wt–wt contacts were long and had a smooth appearance, whereas the majority of fs–fs contacts were short, and had an interrupted (dashed or ‘ragged’) appearance, likely due to claudin-10b within vesicles close to the plasma membrane rather than truly contact-enriched claudin-10b. Cldn10b wt co-localized with claudin-10b fs in cell–cell contacts (merge); however, these contacts had an appearance similar to the contacts observed in cells expressing only Cldn10b fs. As previously described (Alzahrani et al., 2021), Cldn10b fs(SA) did not insert into the plasma membrane but was retained in intracellular compartments (endoplasmic reticulum). The co-expressed Cldn10b wt was unaffected by the presence of Cldn10b fs(SA). Cell–cell contacts had a wt-like appearance. Bars: 5 µm. Lower panel: FRET efficiency as an indicator for cis-interaction was highest when only Cldn10b wt was present (a, gray triangles, n = 46 from m = 5 independent transfections). When only Cldn10b fs (blue diamonds, n = 41, m = 5) was present, or when Cldn10b fs was combined with Cldn10b wt (b; squares; YFP-wt–CFP-fs, red symbols, n = 34, YFP-fs–CFP-wt, yellow symbols m = 4; n = 43, m = 4), FRET was highly significantly lower (a vs. b, 3.96E−12). When Cldn10b fs(SA) was combined with Cldn10b wt (c; green circles, n = 38, m = 4), FRET efficiency was highly significantly lower than FRET efficiencies observed under all other conditions (a vs. c: p = 2.16E−08; b vs. c: p < 1E−16). ANOVA and Tukey Posthoc test. Different transfections per condition are indicated by different shades of the symbols. Red lines indicate mean FRET efficiency ± SEM.

References

    1. Eckardt K-U, et al. Evolving importance of kidney disease: from subspecialty to global health burden. Lancet. 2013;382:158–169. doi: 10.1016/S0140-6736(13)60439-0. - DOI - PubMed
    1. Köttgen A, et al. New loci associated with kidney function and chronic kidney disease. Nat. Genet. 2010;42:376–384. doi: 10.1038/ng.568. - DOI - PMC - PubMed
    1. Pattaro C, et al. Genetic associations at 53 loci highlight cell types and biological pathways relevant for kidney function. Nat. Commun. 2016;7:10023. doi: 10.1038/ncomms10023. - DOI - PMC - PubMed
    1. Wuttke M, et al. A catalog of genetic loci associated with kidney function from analyses of a million individuals. Nat. Genet. 2019;51:957–972. doi: 10.1038/s41588-019-0407-x. - DOI - PMC - PubMed
    1. Okada Y, et al. Meta-analysis identifies multiple loci associated with kidney function-related traits in east Asian populations. Nat. Genet. 2012;44:904–909. doi: 10.1038/ng.2352. - DOI - PMC - PubMed

Publication types