Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Feb;53(2):185-194.
doi: 10.1038/s41588-020-00757-z. Epub 2021 Jan 18.

Genetics of 35 blood and urine biomarkers in the UK Biobank

Affiliations

Genetics of 35 blood and urine biomarkers in the UK Biobank

Nasa Sinnott-Armstrong et al. Nat Genet. 2021 Feb.

Erratum in

  • Author Correction: Genetics of 35 blood and urine biomarkers in the UK Biobank.
    Sinnott-Armstrong N, Tanigawa Y, Amar D, Mars N, Benner C, Aguirre M, Venkataraman GR, Wainberg M, Ollila HM, Kiiskinen T, Havulinna AS, Pirruccello JP, Qian J, Shcherbina A; FinnGen; Rodriguez F, Assimes TL, Agarwala V, Tibshirani R, Hastie T, Ripatti S, Pritchard JK, Daly MJ, Rivas MA. Sinnott-Armstrong N, et al. Nat Genet. 2021 Nov;53(11):1622. doi: 10.1038/s41588-021-00956-2. Nat Genet. 2021. PMID: 34608296 Free PMC article. No abstract available.

Abstract

Clinical laboratory tests are a critical component of the continuum of care. We evaluate the genetic basis of 35 blood and urine laboratory measurements in the UK Biobank (n = 363,228 individuals). We identify 1,857 loci associated with at least one trait, containing 3,374 fine-mapped associations and additional sets of large-effect (>0.1 s.d.) protein-altering, human leukocyte antigen (HLA) and copy number variant (CNV) associations. Through Mendelian randomization (MR) analysis, we discover 51 causal relationships, including previously known agonistic effects of urate on gout and cystatin C on stroke. Finally, we develop polygenic risk scores (PRSs) for each biomarker and build 'multi-PRS' models for diseases using 35 PRSs simultaneously, which improved chronic kidney disease, type 2 diabetes, gout and alcoholic cirrhosis genetic risk stratification in an independent dataset (FinnGen; n = 135,500) relative to single-disease PRSs. Together, our results delineate the genetic basis of biomarkers and their causal influences on diseases and improve genetic risk stratification for common diseases.

PubMed Disclaimer

Conflict of interest statement

Competing financial interests

The Board of Trustees of the Leland Stanford Junior University filed U.S. Provisional Application “Methods for diagnosis of polygenic diseases and phenotypes from genetic variation” (Serial No. 62/852,738) describing this work. J.K.P., M.A.R., N.S.-A., and Y.T. are designated as inventors of the patent. M.A.R is on the SAB of 54Gene and Computational Advisory Board for Goldfinch Bio and has advised BioMarin, Third Rock Ventures, MazeTx and Related Sciences. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Figures

Figure 1.
Figure 1.. Schematic overview of the study.
We prepared a dataset of 35 serum and urine biomarkers from 363,228 individuals in UK Biobank. We analyzed the genetic basis of these biomarkers, assessed their relationship to medically relevant phenotypes, and generated predictive models of disease outcomes from genome-wide data.
Figure 2.
Figure 2.. Genetics of 35 biomarkers. (top left inset)
Correlation of phenotypic (lower triangular matrix) and genetic (upper triangular matrix) effects plot between the 35 lab phenotypes, estimated using LD Score regression. The absolute heritability estimates with standard errors are in Supplementary Table 11a. (main panel) Fuji plot of lab phenotypes across the six categories provided by UK Biobank and genetic variant associations shown for LD independent variants with meta-analysis p < 5 × 10−9. Large-effect protein-truncating and protein-altering variants (labeled when abs(beta) >= 0.1 standard deviation [SD]) annotated with the category of association displayed (colored fill boxes) and highlighted if the loci were not previously reported in the comparison studies (Methods). Pleiotropic association and trait-specific association are shown by different sized circles. The p-values were from two-sided tests and were not corrected for multiple hypothesis testing.
Figure 3.
Figure 3.. Summary of fine-mapped associations across 35 biomarker traits.
(a) FINEMAP analysis summary. (top) The number of identified distinct association signals (color gradient from green to blue) in each region with at least one genome-wide significant (UK Biobank meta-analysis p < 5 × 10−9) association and the number of regions are shown, such as a single signal at 33 regions and two to forty signals at 5,330 regions across 35 traits. (bottom) The number of identified candidate causal variants in the credible set with >= 99% posterior probability (color gradient from green to blue) and the number of signals are shown, such as 2,547 signals were mapped to a single variant in the credible set across 35 traits. (b) Breakdown of the number of fine-mapped associations with posterior probability greater than 0.95 or 0.99 across all biomarkers. Orange, posterior greater than 0.99, green, posterior between 0.95 and 0.99. The total variance explained for each trait is shown and in Supplementary Table 14b. (c) Allelic series showing combined missense, non-coding, and rare copy number variants at the SLCO1B1/SLCO1B3 on total bilirubin levels. Copy number variants annotated below axis and SNPs and short indels annotated above the axis. (d) Pleiotropic effects of fine-mapped rare coding (rs114303452, left) and common non-coding (rs59950280, right) variants at the HGFAC locus. Darker colors of purple indicate more significant associations. The p-values were from two-sided tests and were not corrected for multiple hypothesis testing. The error bars represent standard deviations.
Figure 4.
Figure 4.. Causal inference, transferability of polygenic risk scores, and complex trait association in polygenic risk tails.
(a) Mendelian Randomization estimates causal links between biomarkers (blue nodes) and selected complex traits (red nodes). Association arrows are drawn based on effect direction (red decreasing, blue increasing). Associations were adjusted for FDR 5% cutoff across all tests (Methods, Supplementary Table 16). Edge width is proportional to the absolute causal effect size (log odds per standard deviation). (b) Summary of prediction accuracy of the snpnet polygenic scores across traits, evaluated on a held-out test set in White British as well as other 4 populations in UK Biobank. (c) (x-axis) Biomarker polygenic risk scores for the top 1%, top 10%, bottom 1%, and bottom 10% of individuals and their association to different diseases in UK Biobank, represented as the odds ratio of the disease in this group relative to the 40–60% quantiles. Traits without rows did not have any outcomes with FDR-adjusted significant associations.
Figure 5.
Figure 5.. Multiple regression with biomarker polygenic scores improve prevalent and incident disease prediction.
(a) (x-axis) quantiles of polygenic risk score, spaced to linearly represent the mean of the corresponding bin of scores. (y-axis) Prevalence of chronic kidney disease (n = 2,780 cases and n = 89,409 total, defined by verbal questionnaire and hospital in-patient record ICD code data) within each quantile bin of the polygenic risk score. Error bars represent the standard error around each measurement, and individuals evaluated are held-out European-ancestry individuals in UK Biobank. (b) ROC curve with AUC for chronic kidney disease, comparing the snpnet-derived polygenic score to a multi-PRS model trained across biomarkers as well. Individuals evaluated are held -out European-ancestry individuals in UK Biobank. (c) AUC-ROC estimates for prediction of 10 disease outcomes in a held-out test set of the UK Biobank. Diabetes was run using both a strict definition (excluding from control individuals with HbA1c < 39) and the complete sample (Methods). (d) Hazard ratios for the incidence of type 2 diabetes (n = 17,519), chronic kidney disease (n=3,058), myocardial infarction (n=7,913), heart failure (n = 13,965), gout (n = 1,936), gallstones (n = 11,629), and cirrhosis (n=845) in FinnGen using the standard single-disease PRS trained on UK Biobank using snpnet versus the multi-PRS including both biomarker PRSs and the trait PRS. The strict definition of type 2 diabetes is shown. Error bars represent 95% confidence intervals and points represent mean hazard ratio estimates.

Comment in

  • Biomarker genetics.
    Allison SJ. Allison SJ. Nat Rev Nephrol. 2021 Apr;17(4):222. doi: 10.1038/s41581-021-00400-y. Nat Rev Nephrol. 2021. PMID: 33514940 No abstract available.

References

    1. Willer CJ et al. Discovery and refinement of loci associated with lipid levels. Nat. Genet 45, 1274–1283 (2013). - PMC - PubMed
    1. Klarin D et al. Genetics of blood lipids among ~300,000 multi-ethnic participants of the Million Veteran Program. Nat. Genet 50, 1514 (2018). - PMC - PubMed
    1. Liu DJ et al. Exome-wide association study of plasma lipids in >300,000 individuals. Nat. Genet 49, 1758–1766 (2017). - PMC - PubMed
    1. Wheeler E et al. Impact of common genetic determinants of Hemoglobin A1c on type 2 diabetes risk and diagnosis in ancestrally diverse populations: A transethnic genome-wide meta-analysis. PLoS Med. 14, e1002383–e1002383 (2017). - PMC - PubMed
    1. Spracklen CN et al. Identification and functional analysis of glycemic trait loci in the China Health and Nutrition Survey. PLoS Genet. 14, e1007275 (2018). - PMC - PubMed

Publication types

MeSH terms