Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Oct;56(10):2078-2092.
doi: 10.1038/s41588-024-01904-6. Epub 2024 Sep 10.

Variants in tubule epithelial regulatory elements mediate most heritable differences in human kidney function

Affiliations

Variants in tubule epithelial regulatory elements mediate most heritable differences in human kidney function

Gabriel B Loeb et al. Nat Genet. 2024 Oct.

Abstract

Kidney failure, the decrease of kidney function below a threshold necessary to support life, is a major cause of morbidity and mortality. We performed a genome-wide association study (GWAS) of 406,504 individuals in the UK Biobank, identifying 430 loci affecting kidney function in middle-aged adults. To investigate the cell types affected by these loci, we integrated the GWAS with human kidney candidate cis-regulatory elements (cCREs) identified using single-cell assay for transposase-accessible chromatin sequencing (scATAC-seq). Overall, 56% of kidney function heritability localized to kidney tubule epithelial cCREs and an additional 7% to kidney podocyte cCREs. Thus, most heritable differences in adult kidney function are a result of altered gene expression in these two cell types. Using enhancer assays, allele-specific scATAC-seq and machine learning, we found that many kidney function variants alter tubule epithelial cCRE chromatin accessibility and function. Using CRISPRi, we determined which genes some of these cCREs regulate, implicating NDRG1, CCNB1 and STC1 in human kidney function.

PubMed Disclaimer

Conflict of interest statement

Competing Interests

V.S., A.Y.C., J.D., S.S., and R.R. are employees at GSK. J.F.R. has consulted for Maze Therapeutics, and cofounded stealth companies funded via BridgeBio and 459AM. All other authors declare no competing interests.

Figures

Extended Data Fig. 1:
Extended Data Fig. 1:. Kidney function loci affect chronic kidney disease risk.
Graph of variant effect sizes for kidney function index variants on eGFRcr-cys and chronic kidney disease (CKD) diagnoses from GWAS. The Pearson correlation coefficient (r) and P value for the two-sided Pearson correlation is indicated.
Extended Data Fig. 2:
Extended Data Fig. 2:. Binding motifs of kidney transcription factors are enriched in kidney cCREs.
The binding motifs of kidney transcription factors that are enriched in kidney cCREs. P values are calculated relative to GC-content matched genomic background using a one-sided binomial test.
Extended Data Fig. 3:
Extended Data Fig. 3:. Kidney function variants are within kidney cCREs.
a-c) The association between eGFRcr-cys and variants near TFEB, DDX1 and HOXD8 are indicated. The y-axis indicates the P value from the eGFRcr-cys GWAS computed using whole-genome linear regression. The variant with the strongest association at each locus is indicated by the purple diamond and the LD between the top variant and other variants at the locus are indicated by color. The chromatin accessibility from bulk kidney ATAC-seq and cCREs for each locus are indicated. The ATAC-seq y-axis is in reads per million.
Extended Data Fig. 4:
Extended Data Fig. 4:. Gene activity scores of kidney marker genes.
Gene activity scores—measures of integrated chromatin accessibility around genes that correlate with expression—for established marker genes of each cell type.
Extended Data Fig. 5:
Extended Data Fig. 5:. Kidney tubule epithelial cCREs account for most kidney function SNP heritability.
a-c) Fold enrichment of SNP heritability for eGFRcr, eGFRcys and serum urea levels within cell type-specific and ubiquitously accessible cCREs calculated using stratified LD score regression. Error bars denote jackknife standard errors computed with 200 equally-sized blocks of SNPs. Standard errors were used to calculate z-scores and one-sided Bonferroni-corrected P values (Bonferroni-corrected for the 11 cell types). * Padj<0.05. e) Fraction of eGFRcr-cys SNP heritability explained by kidney podocyte and tubule epithelial cell cCREs (blue), coding exons (beige) and the remainder of the human genome (gray) calculated using stratified LD score regression.
Extended Data Fig. 6:
Extended Data Fig. 6:. Kidney function variants are within tubule cCREs.
a-c) The associations between eGFRcr-cys and variants near TFEB, DDX1 and HOXD8 are indicated. In the top boxes, the y-axis indicates the P value from the eGFRcr-cys GWAS computed using whole-genome linear regression. The variant with the strongest association at each locus is indicated by the purple diamond and the LD between the top variant and other variants at the locus are indicated by color. Chromatin accessibility from scATAC-seq for the indicated kidney cell types are plotted below.
Extended Data Fig. 7:
Extended Data Fig. 7:. Functionally informed fine-mapping and target gene prioritization identifies causal kidney function variants and genes.
a) Top: For fine-mapping performed without functional annotations, the fold enrichment of fine-mapped variants from kidney function loci that cause nonsynonymous changes in a coding sequence, are evolutionarily conserved, or that lie within cCREs common to all tubule epithelial cells is shown. Variants are stratified by posterior inclusion probability. Bottom: Comparison of the number of fine-mapped variants (PIP>0.3) from kidney function loci identified with or without functional annotations. b) Variants from kidney function loci (2,999,203) are plotted by the posterior inclusion probability that was calculated when fine-mapping was done with or without functional annotations. Variants are colored by nonsynonymous, conserved, and tubule epithelial cell cCRE annotations. The x=y line is plotted. c) Top: Precision and recall of prioritized genes for kidney function fine-mapped variants in tubule epithelial cell cCREs (PIP>0.3) was evaluated based on the benchmark kidney function gene set (Supplementary Table 7). The precision and recall of genes prioritized by nearest gene, PoPS, or the intersection of PoPS and nearest gene is shown. Bottom: Comparison of precision and recall between genes prioritized in this study (using the intersection of PoPS and nearest gene) or recent studies of genetic determinants of kidney function, evaluated using the benchmark kidney function gene set. The number of benchmark genes which overlapped prioritized gene loci and were used for evaluation were 13 genes for this study, 19 genes for Liu et al., 4 genes for Stanzick et al. d) Venn diagram of genes prioritized in this study or prior studies of genetic determinants of kidney function,. e) Top: Functionally informed fine-mapping at the SHROOM3 locus. The functionally informed posterior inclusion probability is plotted for variants, which are colored according to the posterior inclusion probability estimated without functional annotations. Bottom: Association of variants with eGFRcr-cys is plotted on the left y-axis as the P value from the eGFRcr-cys GWAS computed using whole-genome linear regression. Color of variants indicates r2 to rs4859682, the variant indicated with the purple diamond.
Extended Data Fig. 8:
Extended Data Fig. 8:. Chromatin accessibility allelic imbalance and machine learning identify cell type-specific effects of variants on chromatin accessibility.
a) Schematic illustrating alleles that differ in their similarity to the HNF4A transcription factor binding motif. b) The fraction of cCREs at which the variant more closely matched to the binding motif of the indicated transcription factor is associated with increased chromatin accessibility. Motif identification is performed using FIMO. P values were calculated using a one-sided binomial test and corrected with the Benjamini-Hochberg method. * Padj < 0.05. c) The Pearson correlation between ChromKid-predicted and experimentally measured chromatin accessibility calculated on a held-out chromosome (chromosome 11) graphed relative to the number of cells within the training set for each cell type. Across cell types, correlation increased with greater cell number. d) ROC curves for the prediction of proximal tubule chromatin accessibility (CAAI), displaying the false positive rate (x-axis) versus true positive rate (y-axis) of cell-type specific predictions from ChromKid. The AUCs for ChromKid predictions for each cell type are shown in the legend. e,f) Cell type-specific variant effect prediction is not driven by the number of cells used for training. ROC curves for the prediction of proximal tubule chromatin accessibility imbalance direction (e) or chromatin accessibility allelic imbalance (f), displaying the false positive rate (x-axis) versus true positive rate (y-axis) of cell-type specific predictions from a version of ChromKid which was trained on the same number of cells per cell type (1272, the number of T cells present in the full dataset). Each curve shows prediction and recall for the ChromKid predictions for the indicated cell type at predicting proximal tubule CAAI. The AUCs for ChromKid predictions for each cell type are shown in the legend. f) Kidney function fine-mapped variants, which are fine-mapped without functional annotations exhibit larger predicted effects on accessibility than other variants within tubule epithelial cCREs. ChromKid-generated predictions of CAAI at kidney function fine-mapped variants within tubule cCREs stratified by PIP calculated without functional annotations (n=207919, 459, 45, and 43 for PIP 0–0.05, 0.05–0.3, 0.3–0.7, 0.7–1.0 respectively). Predicted CAAI is shown as the max0.5-refref+alt across tubule epithelial cell types. Box center line indicates the median, boundaries indicate the 1st and 3rd quartile and whiskers indicate the most extreme data point within 1.5 times the interquartile range. Statistical significance was calculated using a two-sided Mann-Whitney U test and P values were Bonferroni corrected for the three tests.
Extended Data Fig. 9:
Extended Data Fig. 9:. Silencing kidney function cCREs to identify kidney function genes.
a) The fraction of non-promoter tubule epithelial cCREs containing kidney function fine-mapped variants (PIP>0.3) which are accessible in cultured primary human tubule epithelial cells and indicated cell lines. b) Expression of marker genes in proximal tubule, distal tubule or parietal epithelial cells within cultured primary human tubule epithelial cells as measured by scRNA-seq. Marker genes of tubule cells, proximal tubule, distal tubule, and parietal epithelial cells are indicated on the y-axis. c) Histograms of the number of guide constructs expressed per cell (top) and number of cells expressing each guide construct (bottom). d) Effect of transcriptional start site-targeting guides on gene expression. Expression of indicated genes in cells that do not express (Control) or do express a guide (Guide) targeting the transcription start site. P values were calculated using a two-sided quasi-likelihood F test with a negative binomial generalized log-linear model and Bonferroni corrected for the total number of tested guide vector-gene pairs (1503). e) Effect of positive control enhancer guides on gene expression. Expression of indicated genes in cells that do not express (Ctrl) or do express a guide (Guide 1 or Guide 2) targeting a previously identified enhancer. P values were calculated using a two-sided quasi-likelihood F test with a negative binomial generalized log-linear model and Bonferroni corrected for the total number of tested guide vector-gene pairs (1503).
Fig. 1:
Fig. 1:. Kidney cCREs are enriched for kidney function heritability.
a) Manhattan plot for the estimated GFR calculated from serum creatinine and cystatin C levels (eGFRcr-cys) genome wide association study (GWAS) using whole-genome linear regression performed within the UK Biobank. The 577 significant loci are colored to indicate whether evidence from both creatinine and cystatin C GWAS support a role in kidney function (430/577 loci). Labels at the top indicate the closest gene to the index variant. The y-axis is a −log10(P) scale up to 30, beyond which it switches to a −log10[−log10(P)] scale. Dotted line indicates P=5×10−8. b) The 577 eGFRcr-cys index variants (P<5×10−8) plotted by the estimated size of their effect on eGFR estimated with either creatinine (eGFRcr) or cystatin C (eGFRcys). Variants colored orange affect both eGFRcr and eGFRcys (Padj<0.05, Bonferroni adjusted for the 577 variants) in the same direction. c) ATAC-seq of human kidney cortex identifies kidney noncoding cCREs. The contribution of kidney cCREs to human traits is evaluated by calculating the enrichment of trait heritability within kidney cCREs using stratified LD score regression. Kidney image from BioRender.com. d) Fold enrichment of the SNP heritability of 16 human traits within kidney cCREs calculated with stratified LD score regression across all autosomes. Error bars denote jackknife standard errors computed with 200 equally-sized blocks of SNPs. Standard errors were used to calculate zscores and one-sided Bonferroni-corrected P values (Bonferroni-corrected for the 16 traits). * Padj<0.05. e) Several kidney cCREs lie in introns of PDILT 5’ to UMOD, which encodes a Mendelian kidney disease gene. rs77924615 is associated with kidney function (eGFRcr-cys), is in a kidney cCRE, and has low LD with all other variants at the locus. Degree of LD to rs77924615 is indicated by variant color. The y-axis for the association plot indicates the P value from the eGFRcr-cys GWAS computed using whole-genome linear regression. The y-axis for ATAC-seq indicates reads per million.
Fig. 2:
Fig. 2:. Single-cell ATAC-seq of human kidneys identifies cell type-specific cCREs.
a) Schematic of the nephron, colored by major epithelial cell types. Nephron image adapted from BioRender.com. b) scATAC-seq UMAP of 34,240 kidney cells from 3 donors. c) Motifs of transcription factors expressed in the kidney are enriched in cCREs of specific kidney cell types. The enrichment of transcription factor motifs in cCREs with increased accessibility in a given cell type was evaluated by a one-sided hypergeometric test. P values were Bonferroni corrected for the 870 tested transcription factor motifs. Chromatin accessibility around the genes encoding bolded transcription factors is shown in d-f. d-e) Chromatin accessibility maps for two cell type-specific transcription factor genes: HNF4A, expressed in the proximal tubule and TFAP2B, expressed in the loop of Henle and distal tubule. The y-axis indicates reads per million.
Fig. 3:
Fig. 3:. Kidney tubule cCREs account for the majority of SNP heritability of kidney function.
a) Clustering of cCREs distinguishes cell type-specific and ubiquitously accessible cCREs. Rows represent cell types, each vertical line represents one cCRE, and each column represents a cCRE cluster with accessibility in the indicated cell type. The color scale indicates the log2(accessibility for each cell type from pseudobulk scATAC-seq in reads per million). A total of 526,273 cCREs are depicted. IC, intercalated cell. b-c) Fold enrichment of eGFRcr-cys and albuminuria SNP heritability within cell type-specific and ubiquitously accessible cCREs calculated with stratified LD score regression. Error bars denote jackknife standard errors computed with 200 equally-sized blocks of SNPs. Standard errors were used to calculate z-scores and one-sided Bonferroni corrected P values (Bonferroni corrected for the 11 cell types). * Padj<0.05. d) Fraction of eGFRcr-cys SNP heritability explained by kidney tubule epithelial cell cCREs (blue), coding exons (beige), and the remainder of the human genome (gray) calculated with stratified LD score regression. e) rs77924615, a variant associated with eGFRcr-cys, lies within a tubule epithelial-specific cCRE (gray box). The y-axis of the association plot indicates the P value from the eGFRcr-cys GWAS computed using whole-genome linear regression. The y-axes of the scATAC-seq plots indicate reads per million.
Fig. 4:
Fig. 4:. Identification of causal kidney function variants using functionally informed finemapping.
a) Workflow of functionally informed fine-mapping of variants affecting eGFRcr-cys. GWAS for eGFRcr-cys identified 577 loci. 430 of these loci had consistent effects on eGFRcr and eGFRcys. Fine-mapping of these loci with PolyFun incorporated annotations including kidney tubule and podocyte cCREs, evolutionary conservation and gene annotations. PIP, posterior inclusion probability. b) Comparison of the number of fine-mapped variants from kidney function loci identified with or without functional annotations. Variants are separated into those with moderate posterior inclusion probability (0.3–0.7) and high posterior inclusion probability (>0.7). c) Fold enrichment of variants with specific annotations among fine-mapped kidney function variants. Enrichment for variants that cause nonsynonymous changes in a coding sequence, are evolutionarily conserved, or that lie within cCREs that are common to all tubule epithelial cells is plotted. Variants are separated into those with high (>0.7, 300 variants), moderate (0.3–0.7, 487 variants) and low (0.05–0.3, 5,334 variants) posterior inclusion probabilities. d) Functionally informed fine-mapping at the SHROOM3 locus using PolyFun. The functionally informed posterior inclusion probability is plotted for variants that are colored according to the posterior inclusion probability estimated without functional annotations. Chromatin accessibility from scATAC-seq is plotted below. Gray boxes indicate tubule epithelial cCREs containing kidney function fine-mapped variants and the scATAC-seq y-axis indicates reads per million.
Fig. 5:
Fig. 5:. Chromatin accessibility allelic imbalance and machine learning identify cell typespecific effects of variants on chromatin accessibility.
a) Workflow to detect chromatin accessibility allelic imbalance (CAAI). scATAC-seq reads were compared at heterozygous alleles to measure the effect of variants on chromatin accessibility. Kidney image from BioRender.com. b) Enrichment of transcription factor-binding motifs at 2,305 proximal tubule CAAI variants relative to 16,135 non-CAAI variants with equivalent proximal tubule accessibility. Motif disruption was determined using FIMO, enrichment was calculated using a one-sided hypergeometric test, and P values were corrected with the Benjamini-Hochberg method. All motifs in the figure have a Padj<0.05. c) Schematic of ChromKid, a convolutional neural network trained to predict cell type-specific chromatin accessibility for 10 kidney cell types from DNA sequence. ChromKid uses a 1344bp DNA sequence input to generate a prediction of the quantitative chromatin accessibility at the center of the input region for each kidney cell type. d) Measured versus ChromKid-predicted proximal tubule chromatin accessibility on chromosome 1, which was withheld from training. Each point represents measured and predicted chromatin accessibility for a 192bp region. The Pearson correlation coefficient (r) is shown. e) Receiver operating characteristic (ROC) curves for the direction of proximal tubule chromatin accessibility imbalance, displaying the false positive rate (x-axis) versus true positive rate (y-axis) of cell-type specific predictions from ChromKid. The areas under the curves (AUCs) for ChromKid predictions for each cell type are shown in the legend. ChromKid predictions of variant effects in each cell type are compared to proximal tubule CAAI. f) Kidney function fine-mapped variant rs12881545 (PIP=0.34, indicated by a vertical line) maps to a tubule cCRE. CAAI analysis revealed increased chromatin accessibility at the reference allele relative to the alternate allele in the proximal tubule. ChromKid also predicted that the reference allele would show increased chromatin accessibility at this cCRE in the proximal tubule. Statistical significance was calculated with a two-sided binomial test.
Fig. 6:
Fig. 6:. Genetic variants affecting kidney function alter tubule epithelial cis-regulatory element function.
a) Kidney function fine-mapped variants exhibit larger predicted effects on accessibility than other variants within tubule epithelial cCREs. ChromKid-generated predictions of CAAI at kidney function fine-mapped variants within tubule cCREs stratified by PIP (n=206899, 1369, 132, and 67 for PIP 0–0.05, 0.05–0.3, 0.3–0.7, 0.7–1.0 respectively). Predicted CAAI is shown as the max0.5-refref+alt across tubule epithelial cell types. Box center indicates the median, boundaries indicate 1st and 3rd quartile and whiskers indicate the most extreme data point within 1.5 times the interquartile range. Statistical significance was calculated using the two-sided MannWhitney U test and Bonferroni corrected for the three tests. b-d) Left: Plots of scATAC-seq data near kidney function fine-mapped variants, which are depicted by vertical lines. cCREs (gray boxes) containing a kidney function fine-mapped variant were selected for functional testing. The y-axis indicates reads per million. Top right: ChromKid predicted proximal tubule chromatin accessibility for both alleles at these cCREs. Predicted chromatin accessibility for the reference allele is depicted in dark green, and for the alternate allele is depicted in light green. Bottom right: Activity of enhancers in human primary tubule epithelial cells. Enhancer activity of the reference allele is depicted in dark green, and of the alternate allele is depicted in light green. Data are plotted as the mean values +/− SD (n=3 biological replicates). Presented results are representative of three independent experiments. Statistical significance was based on a two-sided t test.
Fig. 7:
Fig. 7:. Silencing kidney function cCREs to identify kidney function genes.
a) Schematic of the approach to map genes regulated by kidney function regulatory elements using CRISPRi-mediated silencing. cCREs were targeted in human primary tubule epithelial cells with CRISPRi. Gene expression and guide expression were measured by scRNA-seq. Created with BioRender.com. b) UMAP of scRNA-seq for 24,563 primary tubular epithelial cells from 4 donors used for CRISPRi-mediated silencing of kidney function cCREs. c) Volcano plot of differentially expressed genes in cells with expression of guides targeting kidney cCREs. P values were calculated using a two-sided quasi-likelihood F test with a negative binomial generalized log-linear model and Bonferroni corrected for the total number of tested cCRE-gene pairs (725). d) Top: Plots of proximal tubule chromatin accessibility from scATAC-seq data with the CRISPRitargeted cCRE depicted with a black square. Bottom: Pseudobulk gene expression of the indicated gene in cells expressing guide construct 1, guide construct 2, or not expressing guides (control) targeting the indicated regulatory element. P values were calculated using a two-sided quasi-likelihood F test with a negative binomial generalized log-linear model and Bonferroni corrected for the total number of tested guide vector-gene pairs (1503).

Update of

References

    1. Dantzler WH Comparative Physiology of the Vertebrate Kidney. (Springer Science & Business Media, 2012).
    1. Cockwell P & Fisher L-A The global burden of chronic kidney disease. The Lancet 395, 662–664 (2020). - PubMed
    1. Akrawi DS et al. Heritability of End-Stage Renal Disease: A Swedish Adoption Study. Nephron 138, 157–165 (2018). - PubMed
    1. Arpegård J et al. Comparison of heritability of Cystatin C- and creatinine-based estimates of kidney function and their relation to heritability of cardiovascular disease. J. Am. Heart Assoc. 4, e001467 (2015). - PMC - PubMed
    1. Raggi P et al. Heritability of renal function and inflammatory markers in adult male twins. Am. J. Nephrol. 32, 317–323 (2010). - PMC - PubMed

Methods-only references

    1. Corces MR et al. An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissues. Nat. Methods 14, 959–962 (2017). - PMC - PubMed
    1. Smith JP et al. PEPATAC: an optimized pipeline for ATAC-seq data analysis with serial alignments. NAR Genomics Bioinforma. 3, lqab101 (2021). - PMC - PubMed
    1. Heinz S et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010). - PMC - PubMed
    1. Granja JM et al. ArchR is a scalable so[ware package for integrative single-cell chromatin accessibility analysis. Nat. Genet. 53, 403–411 (2021). - PMC - PubMed
    1. Wilson PC et al. The single-cell transcriptomic landscape of early human diabetic nephropathy. Proc. Natl. Acad. Sci. 116, 19619–19625 (2019). - PMC - PubMed