Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jan;57(1):193-205.
doi: 10.1038/s41588-024-01965-7. Epub 2025 Jan 2.

Coupling metabolomics and exome sequencing reveals graded effects of rare damaging heterozygous variants on gene function and human traits

Affiliations

Coupling metabolomics and exome sequencing reveals graded effects of rare damaging heterozygous variants on gene function and human traits

Nora Scherer et al. Nat Genet. 2025 Jan.

Abstract

Genetic studies of the metabolome can uncover enzymatic and transport processes shaping human metabolism. Using rare variant aggregation testing based on whole-exome sequencing data to detect genes associated with levels of 1,294 plasma and 1,396 urine metabolites, we discovered 235 gene-metabolite associations, many previously unreported. Complementary approaches (genetic, computational (in silico gene knockouts in whole-body models of human metabolism) and one experimental proof of principle) provided orthogonal evidence that studies of rare, damaging variants in the heterozygous state permit inferences concordant with those from inborn errors of metabolism. Allelic series of functional variants in transporters responsible for transcellular sulfate reabsorption (SLC13A1, SLC26A1) exhibited graded effects on plasma sulfate and human height and pinpointed alleles associated with increased odds of diverse musculoskeletal traits and diseases in the population. This integrative approach can identify new players in incompletely characterized human metabolic reactions and reveal metabolic readouts informative of human traits and diseases.

PubMed Disclaimer

Conflict of interest statement

Competing interests: C.W., S.M., Y.X., R.G. and K.E. are employees of and own shares in Maze Therapeutics. A.K. reports a sponsored research collaboration agreement with Maze Therapeutics. The other authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Overview of the study design.
Schematic representation of the gene-based rare variant aggregation study with plasma and urine metabolite levels using WES data of 4,737 participants from the GCKD study and their follow-up analyses. MAF, minor allele frequency; PC, principal component; PheWAS, phenome-wide association study.
Fig. 2
Fig. 2. Overview of the 192 identified gene–metabolite associations across plasma and urine and their corresponding pathways.
a, Significant associations with plasma metabolites are shown on the outermost band (red; shading reflects effect direction), with genes ordered by chromosomal location across the genome. Associations with urine metabolites are shown on the middle band (blue; shading reflects effect direction). Gene–metabolite associations are based on rare variant aggregation testing from both masks. The ones labeled in gray were already reported in previous rare variant studies–,, whereas the ones labeled in black are considered new. White spaces indicate that no significant association was detected in a given matrix. For all associations detected in both matrices, effect directions are consistent. The inner band represents the superpathway of the associated metabolite. GPE, glycero-3-phosphoethanolamine; GPI, glycosylphosphatidylinositol; DC, dicarboxylic acid. b, The UpSet plot shows the number of identified gene–metabolite associations by mask and matrix, color coded by the respective metabolite superpathway. Right, horizontal bar plot represents the total number of associations identified by mask and matrix. The proportion of lipids is markedly higher among associations detected with plasma metabolites as compared with urine. Top left, vertical bar plot shows the number of shared associations by mask and matrix, while the sets among which the associations are shared are indicated below each column. While the majority of associations are detected by both masks, especially the less-stringent HI_mis mask provides many mask-specific findings in both plasma and urine. The group of metabolites detected in both plasma and urine is dominated by amino acids. Part. charact., partially characterized.
Fig. 3
Fig. 3. Independent validation of findings using orthogonal approaches.
a, Gene-based testing of significantly associated, available plasma metabolites among ≥261,661 UKB participants (y axis) using the same mask and only including QVs available in both the GCKD study and the UKB provided very similar effect sizes in the two studies (x axis). Bars represent corresponding standard errors (SE), symbol color reflects the −log10 (P value), and the size depicts the cumulative minor allele frequency of all QVs within a gene. In the UKB, gene-based burden tests were performed as implemented in REGENIE (Methods). cMAF, cumulative MAF. b, Plasma levels of the proteins encoded by 17 of the 73 significant, metabolite-associated genes were measured in the UKB (N ≥ 44,108). Among the available gene-level summary statistics, 15 genes showed cis associations with plasma protein levels with an association P value < 1 × 10−5 based on a ‘ptvraredmg’ mask, which is similar to the masks used in the GCKD study. Genes are shown on the y axis. Effect sizes and the corresponding standard errors of the cis associations with plasma protein levels based on these summary statistics are displayed on the x axis. Symbol color reflects the −log10 (P value). For all cis associations, the direction of effect sizes was negative, consistent with LoF variants resulting in lower plasma protein levels. c, Single-variant effect sizes on levels of a given plasma metabolite in the GCKD study (x axis) were very similar to those in the INTERVAL study (Bomba et al.) (y axis) for all QVs involved in significant gene–metabolite associations in the GCKD study that were also available in the INTERVAL study and showed an association P value < 0.1 in both studies. Horizontal bars indicate standard errors of effect sizes in the GCKD study (not available for the INTERVAL summary statistics). The depicted 200 associations involved 35 unique genes and 75 unique metabolites. Symbol color indicates the INTERVAL association P value. Gray lines represent identity (dotted) and the linear regression line (dashed). The strong correlation of effect sizes supports reproducibility. d, Summary statistics for 89 of 128 plasma gene–metabolite associations were available in the INTERVAL study (Bomba et al.). Effect sizes of gene–metabolite associations on the aggregated level in the GCKD study (x axis) were very similar to those in the INTERVAL study (y axis; Methods and Supplementary Table 8), despite differences in masks and aggregation unit. Horizontal error bars indicate the standard errors of effect sizes in the GCKD study. Standard errors were not available in the summary statistics from Bomba et al. Symbol shape indicates the corresponding mask used by Bomba et al. (LOF, high-confidence LoF variants; MLOF, LoF and missense variants combined; CODING, all rare exonic variants, splice sites and variants residing in untranslated regions). Color reflects the association P value in the INTERVAL study. Gray lines represent the identity (dotted) and the linear regression line (dashed).
Fig. 4
Fig. 4. Methionine sulfone is a direct SLC6A19 substrate in vitro.
a, Transport of l-isoleucine and methionine sulfone in CHO cells overexpressing SLC6A19 (SLC6A19 cells) and its chaperone CLTRN versus mock cells. Substrate transport activity was measured by fluorescence increase with a membrane potential dye. The x axis represents increasing concentrations of substrate, and the y axis represents fold over no-substrate-driven signal. Data were generated from four biological replicates and are represented as mean ± s.d. EC50, half-maximum effective concentration. b, Effect of cinromide on transport of l-isoleucine and methionine sulfone in cells overexpressing SLC6A19 and CLTRN. l-isoleucine and methionine sulfone transport was abrogated by cinromide, a specific inhibitor of SLC6A19. The x axis represents increasing concentrations of substrate, and the y axis represents activity of maximal substrate-driven signal. Data were generated in the same experiment described in a.
Fig. 5
Fig. 5. Differences in urine metabolite levels between male and female carriers of QVs in X chromosomal TMLHE reflect a dose–response effect.
Top, plots represent covariate-adjusted urine levels of N6,N6,N6-trimethyllysine after inverse normal transformation (INT; y axis) among male (left) and female (right) noncarriers and carriers of QVs in TMLHE based on the HI_mis mask (x axis). Symbol color and shape indicate a variant’s driver status and consequence, respectively. The boxes range from the 25th percentile to the 75th percentile of metabolite levels, the median is indicated by a line, and whiskers end at the last observed value within 1.5 × (interquartile range) of the box. Among men hemizygous for a QV in TMLHE, the levels of the substrate N6,N6,N6-trimethyllysine were markedly higher than in heterozygous women, reflecting more severe impairment of the encoded enzyme’s function in hemizygous men. P values correspond to the sex-specific burden tests based on driver variants, with P values based on all QVs in parentheses (Supplementary Table 10 and the Supplementary Methods). Metabolites’ formulas were taken from https://commons.wikimedia.org/. Bottom, plots represent urine levels of covariate-adjusted hydroxy-N6,N6,N6-trimethyllysine after inverse normal transformation (y axis) across male (left) and female (right) noncarriers and carriers of QVs in TMLHE based on the HI_mis mask (x axis). Because hydroxy-N6,N6,N6-trimethyllysine is the product of trimethyllysine dioxygenase, the enzyme encoded by TMLHE, LoF QVs lead to decreased metabolite levels more strongly among men than among women. Effect sizes in men and women were significantly different (P value = 3 × 10−4; Supplementary Methods). A schematic depiction of the well-studied reaction catalyzed by trimethyllysine dioxygenase and of its substrate N6,N6,N6-trimethyllysine and product hydroxy-N6,N6,N6-trimethyllysine is included. Multi-hetero, multi-heterozygous.
Fig. 6
Fig. 6. Altered metabolite levels are a readout of impaired KYNU and PAH function: converging evidence from three approaches.
a,b, Three panels are shown for 8-methoxykynurenate levels associated with KYNU (a) and for phenylalanine levels associated with PAH (b) that visualize evidence from three complementary approaches. Left, covariate-adjusted inverse normal-transformed levels of the metabolite (y axis) among noncarriers (N = 4,589 and 4,562) and carriers (N = 25 and 151) of QVs in in the respective gene (x axis). Units correspond to standard deviations. The boxes range from the 25th percentile to the 75th percentile of metabolite levels, the median is indicated by a line, and whiskers end at the last observed value within 1.5 × (interquartile range) of the box. Middle, distribution of the ln-transformed secretion flux of the metabolite in mmol per day into urine (y axis) from minimum-norm quadratic programming (QP) simulations based on 569 and 567 microbiome-personalized WBMs without and with simulated knockout of KYNU and PAH, respectively (x axis). a, Right, multiple-reaction monitoring (m/z 220.0 → 174.1) chromatograms of the diluted urine of a child with a homozygous loss of KYNU function (patient), the heterozygous mother and the healthy father (maternal uniparental isodisomy). The signal at 12.5 min representing 8-methoxykynurenate is strongly enhanced in the patient sample. Chromatograms are normalized to urine creatinine concentrations; y axes are normalized to the intensity of the signal in the patient’s chromatograms. b, Right, UV–visible chromatograms (570 nm) of amino acids (post-column derivatization with ninhydrin) in serum samples of a child with a homozygous loss of PAH function (c.1199+1G>C), the compound heterozygous father who additionally carries a mild mutation (c.1180G>C) and the heterozygous mother. The signal at 85.5 min represents phenylalanine (Phe). The signal at 106 min is the internal standard (IS). The reference range for phenylalanine concentrations in children is 38–137 µmol l−1 and in adults is 26–91 µmol l−1.
Fig. 7
Fig. 7. Impact of functional QVs in SLC13A1 and SLC26A1 on height, musculoskeletal traits and fractures supports the role of plasma sulfate as an intermediate readout.
a, Schematic representation of the sulfate reabsorption mechanism involving NaS1 encoded by SLC13A1 at the apical membrane and SAT1 encoded by SLC26A1 at the basolateral membrane of epithelial cells. Figure created with https://www.biorender.com. b, Scatterplot shows the relation between the effect sizes of six QVs on plasma sulfate levels in the GCKD study (x axis) and on standing height in the UKB (N ≥ 466,907) (y axis). Effect sizes correspond to single-variant association tests under additive modeling with inverse normal-transformed traits. Symbol color and shape indicate the gene (shades of red, SLC13A1; shades of blue, SLC26A1) and consequence of the QV. Symbol size represents the association P value with respect to height. The black line is the linear regression line through the data points. Variant effect sizes for plasma sulfate levels are clearly correlated with the ones for standing height (Pearson correlation r = 0.70, allelic series). c, The volcano plot shows odds ratios (x axis) and −log10 (P values) (y axis) for associations of the six QVs with musculoskeletal diseases and fractures in the UKB (N ≥ 468,279), based on a Firth regression (Methods and Supplementary Table 17). Only clinical traits for which at least two carriers were identified among both individuals with and without disease are included in the plot. Symbol color indicates the QV and whether the corresponding P value was nominally significant (P value < 0.05). Symbol size corresponds to the number of QV carriers with disease. While both increased and decreased odds of disease were observed when associations were not significant, increased odds for musculoskeletal diseases and fractures dominated for significant associations.
Fig. 8
Fig. 8. Impact of different genotypes encoding the NaS1 p.Arg272Cys substitution on height and musculoskeletal traits and fractures.
a, The box plot shows differences in age- and sex-specific z scores for standing height (y axis; Methods) across UKB participants heterozygous and homozygous for the p.Arg272Cys-encoding allele (x axis). The boxes range from the 25th percentile to the 75th percentile of z scores, the median is indicated by a line, and whiskers end at the last observed value within 1.5 × (interquartile range) of the box. A dose–response effect is observable between heterozygous individuals (N = 289, median = −0.13, het + hom ref) and individuals carrying NaS1 p.Arg272Cys as well as SAT1 p.Leu348Pro (N = 4, median = −1.13, het + het) and one person homozygous for p.Arg272Cys (z score = −3.51, hom alt + hom ref). b, Association between the NaS1 p.Arg272Cys substitution with musculoskeletal diseases and fractures from the UKB (N ≥ 468,279), for which at least two carriers were identified among both individuals with and without disease (y axis). Numbers in parentheses indicate the number of p.Arg272Cys carriers with a respective disease. Odds ratios and their corresponding 95% confidence intervals (CI; x axis) are based on Firth regression (Methods). The symbol color reflects the −log10 (P value). Only associations with P value < 0.05 are shown.
Extended Data Fig. 1
Extended Data Fig. 1. Comparison of gene-metabolite associations based on the main analysis masks LoF_mis/HI_mis with a LoF only mask in the GCKD study.
(a) Effect size of gene-metabolite associations using burden tests based on the LoF_mis/HI_mis masks (y-axis) vs. those based on the LoF mask (x-axis) (Supplementary Table 4). Error bars represent the standard errors of the effect sizes. Symbol shape indicates the corresponding mask. Color reflects the comparison P-value between effect sizes based on the test statistic Z (see Supplementary Methods), indicating that the effect sizes between the LoF_mis/HI_mis and the LoF mask differ significantly (P-value < 0.05/178 adjusted for the number of associations available based on the LoF mask) just for one association of DPYD with uracil. Gray symbols reflect the 57 gene-metabolite pairs for which no gene-based test could be performed based on the LoF mask. Gray lines represent the identity (dotted) and the linear regression line (dashed). (b) -log10(P-value) of gene-metabolite associations using burden tests based on the LoF_mis/HI_mis masks (y-axis) vs. those based on the LoF mask (x-axis). Symbol shape indicates the corresponding mask. Color reflects the availability of gene-metabolite pairs based on the LoF mask. The gray dotted line represents the identity line.
Extended Data Fig. 2
Extended Data Fig. 2. Comparison of P-values of significant gene-metabolite associations between burden and SKAT tests in the GCKD study.
The -log10(P-value) of gene-metabolite associations based on the burden test (x-axis) vs. those based on the SKAT test (y-axis) (Supplementary Table 4). Symbol shape indicates the corresponding mask. Color reflects the effect size provided by the burden test. Gray lines represent the identity (dotted) and the linear regression line (dashed).
Extended Data Fig. 3
Extended Data Fig. 3. Comparison of effect sizes across strata of eGFR and sex in the GCKD study.
(a) Effect size of gene-metabolite associations among individuals with eGFR >45 mL/min/1.73 m2 (N = 2,528, x-axis) vs. those with eGFR <=45 mL/min/1.73 m2 (N = 2,185, y-axis) (Supplementary Table 5). Error bars represent the standard errors of the effect sizes. Symbol shape indicates the corresponding matrix. Color reflects the comparison P-value between both strata, indicating that effect sizes between individuals with high and low eGFR did not significantly differ for any gene-metabolite association, defined as P-value < 0.05/128 for plasma and P-value < 0.05/107 for urine. Gray lines represent the identity (dotted) and the linear regression line (dashed). (b) The effect size of gene-metabolite associations among men (N = 2,837, x-axis) and women (N = 1,876, y-axis). Error bars represent the standard errors of the effect sizes. Symbol shape indicates the corresponding matrix. Color reflects the comparison P-value between both strata. Gene-metabolite associations with effect sizes that significantly differ between men and women (P-value < 0.05/128 for plasma or <0.05/107 for urine) are labeled and are exclusively observed for the X-chromosomal TMLHE gene, where hemizygous men show more extreme metabolite levels than heterozygous women. Gray lines represent the identity (dotted) and the linear regression line (dashed).
Extended Data Fig. 4
Extended Data Fig. 4. Driver variants show a more severe impact on metabolite levels compared with non-drivers in terms of consequence and effect size.
(a) The bar plot represents the absolute frequency (y-axis) of each of the QVs’ consequences with their proportions noted next to them, separately for driver and non-driver variants (x-axis). Whereas driver variants contain more splicing, stop-gain and frameshift variants, the proportion of missense variants is higher among non-driver variants (Fisher’s exact test: P-value = 1.3e-6). (b) The swarm plot shows differences in absolute effect sizes for QVs (y-axis) across the five different consequence classes (x-axis). The color reflects the variant status (driver versus non-driver variant). Horizontal lines represent the median of the absolute effect sizes separately for driver and non-driver variants. The median effect of driver variants on metabolite levels increases when ordering the consequence classes with respect to severity (missense, start/stop-lost, frameshift, stop-gain, splicing). (c) The swarm plot shows differences in absolute effect sizes for QVs (y-axis) across groups of variants by minor allele count (MAC) bins (x-axis). Color reflects variant status (driver versus non-driver). Horizontal lines represent the median of the absolute effect sizes separately for driver and non-driver variants. The median effect of driver variants on metabolite levels increases with decreasing MAC bin, supporting that ultra-rare variants tend to have larger effects than those observed more frequently. In case one gene was significantly associated with levels of more than one metabolite, only the QVs from the strongest gene-metabolite associations are included (for only one matrix and only one mask) in each panel, to prevent counting variants multiple times, resulting in 1,885 QVs that were included in each panel.
Extended Data Fig. 5
Extended Data Fig. 5. Comparison and integration of rare and common variant association signals with metabolite levels within the same locus.
(a) The scatter plot shows the absolute effect size (y-axis) of association signals with metabolite levels based on aggregating rare variants within a gene using burden tests and based on the common variant with the lowest significant individual association P-value within the same locus (±500 kb around the gene) under additive modeling (Supplementary Methods), across different cumulative minor allele frequencies (cMAF, for aggregated rare variants) and minor allele frequencies (MAF, for common variants) (x-axis). Colors indicate whether the corresponding association signal is based on shared rare or common variants or whether it is unique to the rare variant screen. The shape represents the matrix of the corresponding metabolite. The absolute effect size tends to increase with decreasing MAF/cMAF. (b) Effect size of 157 gene-metabolite associations with conditioning on associated common variants within the gene region (x-axis) vs. without conditioning on common variants (y-axis) (Supplementary Table 9). Error bars represent the standard errors of the effect sizes. Symbol shape indicates the corresponding matrix. Color reflects the comparison P-value between both analyses, indicating that effect sizes between (un)conditional analyses did not significantly differ for any gene-metabolite association, defined as P-value < 0.05/157. Gray lines represent the identity (dotted) and the linear regression line (dashed).
Extended Data Fig. 6
Extended Data Fig. 6. Metabolite levels by QV carrier status for significantly associated genes with more than one homozygous carrier.
Urine metabolite levels after inverse normal transformation and covariate-adjustment are shown on the y-axis, among non-carriers and heterozygous and homozygous carriers of QVs in the HI_mis mask on the x-axis. Symbol color and shape indicate a variant’s carrier status and consequence, respectively. Carriers of multiple heterozygous QVs are denoted by an asterisk. Boxes range from the 25th to the 75th percentile of metabolite levels, the median is indicated by a line, and whiskers end at the last observed value within 1.5*(interquartile range) away from the box. The median levels of ribonate (N = 4,618) (a), glycocholate (N = 3,753) (b), and (N(1) + N(8))−acetylspermidine (N = 4,619) (c) are all more extreme for the homozygous than the heterozygous carriers, reflecting a dose-response effect.
Extended Data Fig. 7
Extended Data Fig. 7. Illustration of microbiome-personalized whole-body models of in silico knockout modeling.
(a) Generation of whole-body models (WBMs) of human metabolism: Generic reconstructions of human metabolism of Recon3D are used and pruned by organ-specific data and anatomical information, to form differentiated, organ-specific male and female models of human metabolism. The models are derived from Thiele et al. 2020, PMID: 32463598. (b) Identify gene-reaction relations: For each Gene G, it is essential to identify every reaction carried out by the corresponding enzyme/transporter across all organs. The corresponding reactions RG={rG1,,rGn} are used for gene-knockout modeling in the linear programming (LP) and quadratic programming (QP) knockout methodologies. (c) Generation of microbiome-personalized whole-body models: Utilizing abundance-data of metagenomic samples, personalized gut microbiome community models are generated via combining the individually present genome-scale reconstructions of microbes in a common lumen compartment. The lumen compartment is then integrated with the whole-body model, thereby creating an individualized host-microbiome model. (d) LP gene-knockout modeling: The system of mass balance equations at steady state, as defined by the stoichiometric matrix S and constraints for each flux, forms a convex space of possible solutions for a vector v of fluxes of a WBM. The sum of fluxes of reactions in RG (expressed as SG=vG1++vGn) is maximized, and this maximal value is subsequently imposed as a constraint in the wild-type model. Setting all fluxes corresponding to the reactions in RG to zero forms the knockout model. The maximum flux of a biomarker vGM is computed in both models and compared with each other. (e) QP gene-knockout modeling: Each microbiome-personalized whole-body model spans up its individual wild-type solution space under steady state and constraint settings. The solution of the quadratic objective of the flux vector under steady state and constraints for each reaction of the WBM forms the wild-type solution for Harvey/Harvetta, denoted as vWT=(vWT,1,,vWT,n). Setting each flux of reaction in RG to zero and repeating the procedure yields the QP solution after knockout, vKO. The unique QP solutions across all samples allows for screening for differences within each reaction of the WBM to find possible biomarkers, and for the computation of effect sizes for each reaction. The visualization of generic reconstructions of Recon3D of Extended Data Fig. 7 was generated using a screenshot of ReconMap3 (https://www.vmh.life/#reconmap) and further edited with Inkscape (https://inkscape.org/). The 3D visualizations in Extended Data Fig. 7 and Extended Data Fig. 7 were produced using Python (https://www.python.org/), while the 2D images with the coordinate systems were created using LaTeX (https://www.latex-project.org/).The remaining subfigures, along with the final composition of the complete figure, were created using BioRender.com, integrating all previously mentioned elements. Figure created with BioRender.com.
Extended Data Fig. 8
Extended Data Fig. 8. Elevated urine levels of 3-hydroxykynurenine and xanthurenate are a readout of impaired KYNU function: converging evidence from three approaches.
Three panels are shown for 3-hydroxykynurenine (a) and xanthurenate (b) each: the left panel represents inverse-normal transformed, covariate-adjusted urine levels of the respective metabolite (y-axis) among non-carriers and carriers of QVs in KYNU (x-axis). Units correspond to standard deviations. The boxes range from the 25th to the 75th percentile of metabolite levels, the median is indicated by a line, and whiskers end at the last observed value within 1.5*(interquartile range) away from the box. The middle panel represents the distribution of the ln-transformed urinary secretion flux of the respective metabolite in mmol/day into urine (y-axis) from min-norm simulations based on solutions of 569 microbiome-personalized whole-body models without and with simulated knockout of KYNU (x-axis). The right panel shows multiple reaction monitoring (MRM, m/z 225.0 → 162.1, 206.0 → 160.1) chromatograms of the diluted urines of a child with a homozygous, autosomal-recessively inherited loss of KYNU function (patient), the mother and the father. The signals at 3.9 min (3-hydroxykynurenine) and 9.5 min (xanthurenate) are strongly enhanced in the patient sample. Chromatograms are normalized to urine creatinine concentrations; y-axes are normalized to the intensity of the signals in the patient’s chromatograms. All three independent approaches arrive at the conclusion that elevated levels of 3-hydroxykynurenine and xanthurenate in urine are a readout of impaired KYNU function.
Extended Data Fig. 9
Extended Data Fig. 9. Contribution of individual QVs in SLC26A1 to their gene-based association signal with plasma sulfate levels.
The symbols visualize the -log10(P-value) (y-axis) with regard to plasma sulfate levels for the successive aggregation of the most influential QVs in SLC26A1 with respect to the forward selection procedure (Bomba et al, PMID: 35568032, Methods) based on burden tests for the mask LoF_mis. The number of QVs aggregated for burden testing is shown on the x-axis. Symbol shape indicates the variant’s consequence. The symbol color and size reflect the effect size and the P-value of the variant based on its single-variant association test. The gray dashed lines represent the significance threshold (-log10(0.05)), the total -log10(P-value) of the aggregate variant test including all QVs in SLC26A1 for the mask LoF_mis, and the -log10(lowest P-value) that can be reached by aggregating only the driver variants from the forward selection procedure. Summary statistics shown on the right refer to the burden tests aggregating all QVs and only driver variants. For the latter, a clear association of SLC26A1 with plasma sulfate levels is observed.
Extended Data Fig. 10
Extended Data Fig. 10. Impact of different genotypes encoding NaS1 p.Arg12* and SAT1 p.Leu348Pro on height and musculoskeletal traits and fractures.
The boxplots on the left show differences in age- and sex-specific z-scores for standing height (y-axis, Methods) across individuals in the UKB heterozygous and homozygous for the NaS1 p.Arg12*-encoding allele (a) and for the SAT1 p.Leu348Pro-encoding allele (b) (x-axis). The boxes range from the 25th to the 75th percentile of z-scores, the median is indicated by a line, and whiskers end at the last observed value within 1.5*(interquartile range) away from the box. Heterozygous individuals carrying only NaS1 p.Arg12* (N = 2,460) and SAT1 p.Leu348Pro (N = 3,096), respectively, are depicted in the ‘het’ category. Individuals carrying a variant at two different DNA positions are shown in the category ‘het + X’ (N = 15 for NaS1 p.Arg12* and N = 20 for SAT1 p.Leu348Pro). For the NaS1 p.Arg12* stop-gain variant, multi-heterozygous individuals, who additionally carry the NaS1 p.Trp48* stop-gain variant, are indicated with differently shaped symbols, emphasizing that carrying two stop-gain variants in NaS1 seems to lead to a more severe phenotype. The forest plots on the right show associations between the NaS1 p.Arg12* (c) and SAT1 p.Leu348Pro (d) carrier status with musculoskeletal diseases and fractures from the UKB (N ≥ 468,279), for which at least 2 carriers were identified among individuals with and without disease (y-axis). Number in parentheses indicate the number of carriers with a given disease. Odds ratios and their corresponding 95% confidence interval (x-axis) are based on Firth regression (Methods). The symbol color reflects the -log10(P-value). Only associations with P-value < 0.05 are shown.

References

    1. Schlosser, P. et al. Genetic studies of urinary metabolites illuminate mechanisms of detoxification and excretion in humans. Nat. Genet.52, 167–176 (2020). - PMC - PubMed
    1. Schlosser, P. et al. Genetic studies of paired metabolomes reveal enzymatic and transport processes at the interface of plasma and urine. Nat. Genet.55, 995–1008 (2023). - PMC - PubMed
    1. Suhre, K. et al. Human metabolic individuality in biomedical and pharmaceutical research. Nature477, 54–60 (2011). - PMC - PubMed
    1. Kettunen, J. et al. Genome-wide association study identifies multiple loci influencing human serum metabolite levels. Nat. Genet.44, 269–276 (2012). - PMC - PubMed
    1. Shin, S.-Y. et al. An atlas of genetic influences on human blood metabolites. Nat. Genet.46, 543–550 (2014). - PMC - PubMed

LinkOut - more resources