Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Feb;57(2):345-357.
doi: 10.1038/s41588-024-02052-7. Epub 2025 Jan 24.

Integrative proteogenomic analysis identifies COL6A3-derived endotrophin as a mediator of the effect of obesity on coronary artery disease

Affiliations

Integrative proteogenomic analysis identifies COL6A3-derived endotrophin as a mediator of the effect of obesity on coronary artery disease

Satoshi Yoshiji et al. Nat Genet. 2025 Feb.

Abstract

Obesity strongly increases the risk of cardiometabolic diseases, yet the underlying mediators of this relationship are not fully understood. Given that obesity strongly influences circulating protein levels, we investigated proteins mediating the effects of obesity on coronary artery disease, stroke and type 2 diabetes. By integrating two-step proteome-wide Mendelian randomization, colocalization, epigenomics and single-cell RNA sequencing, we identified five mediators and prioritized collagen type VI α3 (COL6A3). COL6A3 levels were strongly increased by body mass index and increased coronary artery disease risk. Notably, the carboxyl terminus product of COL6A3, endotrophin, drove this effect. COL6A3 was highly expressed in disease-relevant cell types and tissues. Finally, we found that body fat reduction could reduce plasma levels of COL6A3-derived endotrophin, indicating a tractable way to modify endotrophin levels. In summary, we provide actionable insights into how circulating proteins mediate the effects of obesity on cardiometabolic diseases and prioritize endotrophin as a potential therapeutic target.

PubMed Disclaimer

Conflict of interest statement

Competing interests: J.B.R. has served as an advisor to GlaxoSmithKline and Deerfield Capital. J.B.R.’s institution has received investigator-initiated grant funding from Eli Lilly, GlaxoSmithKline, and Biogen for projects unrelated to this research. J.B.R. is the CEO of 5 Prime Sciences ( www.5primesciences.com ), which provides research services for biotech, pharma and venture capital companies for projects unrelated to this research. T.L., Y.C. and V.F. are employees of 5 Prime Sciences. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Study overview and summary.
To identify proteins mediating the effects of obesity on cardiometabolic diseases, we employed a two-step MR approach. In step 1, we assessed the impact of BMI on 4,907 plasma proteins using two-sample MR, through which we identified 1,213 proteins influenced by BMI, termed ‘BMI-driven proteins’. In step 2, we evaluated the effects of these BMI-driven proteins on cardiometabolic diseases through additional two-sample MR analyses. Subsequent work included follow-up analyses of COL6A3 and an evaluation of the potential actionability of this protein and other identified mediators. Created using BioRender.com.
Fig. 2
Fig. 2. MR step 1: estimating the causal effects of BMI on plasma protein levels.
a, Flow diagram outlining MR step 1. b, Volcano plot illustrating the effects of BMI on each plasma protein from MR analyses using the IVW method. The x axis represents beta estimates, and the y axis represents −log10P values from MR results. The P values were obtained using the random-effects IVW method (two-sided test). Red dots represent proteins that passed all tests, including significance with Bonferroni correction (P < 0.05/4,907), as well as tests for heterogeneity, directional horizontal pleiotropy, reverse causation and directional concordance with body fat percentage. Gray dots represent proteins that failed any of these tests. c, MR scatter plot showing the effects of BMI on plasma levels of COL6A3 using the IVW (primary analysis; red regression line), weighted median (blue regression line) or MR-Egger slope (purple regression line) methods. Note that the MR-Egger slope (purple: β = 0.32) overlaps with the IVW slope (red: β = 0.32). Error bars represent the 95% CI for each variant’s effect estimate. d, Directional consistency between MR results for the effects of BMI on plasma proteins and MR results for the effects of body fat percentage on plasma protein levels using the IVW method. The x axis denotes beta estimates from MR results, and r denotes Pearson’s correlations. P values were obtained using two-sided Pearson’s correlation test. Source data
Fig. 3
Fig. 3. Step 2 MR: estimating the causal effects of BMI-driven proteins on cardiometabolic diseases.
a, Flow diagram of the MR step 2 analyses. b, Forest plots showing the effects of BMI-driven proteins on four cardiometabolic diseases (CAD, ischemic stroke, cardioembolic stroke and type 2 diabetes). The MR analyses were conducted using the largest available GWAS of coronary artery disease (181,522 cases and 1,165,690 controls), ischemic stroke (62,100 cases and 1,234,808 controls), cardioembolic stroke (10,804 cases and 1,234,808 controls) and type 2 diabetes (80,154 cases and 853,816 controls). C-COL6A3, C-terminal COL6A3. P values were obtained using the random-effects IVW method (two-sided test). Error bars represent the 95% CI for effect estimates. c, LocusZoom plots of (left) the pQTL for C-terminal COL6A3 and (right) CAD in the 500-kb region surrounding the lead cis-pQTL, rs11677932. PP.H4, posterior probability of having the shared causal variant (hypothesis H4 in colocalization). a, created using BioRender.com. Source data
Fig. 4
Fig. 4. Baseline COL6A3 levels and cumulative incidence of CAD.
Multivariable Cox proportional-hazards regression analysis in 38,361 individuals (2,969 cases and 32,131 controls) from the UK Biobank. Q1 (blue) represents the lowest 25% group, Q2 (green) the 26–50% group, Q3 (orange) the 51–75% group and Q4 (red) the highest quantile group (76–100%, from the 75th percentile to the maximum value) based on baseline plasma COL6A3 levels. The center lines represent effect estimates in each group, and the shaded areas around the lines represent 95% CIs.
Fig. 5
Fig. 5. Follow-up analyses for COL6A3.
a, Schematic illustration of proposed relationship between obesity, COL6A3, endotrophin and CAD. Obesity leads to increased production of COL6A3, whose C-terminal is cleaved into an active form termed endotrophin, which increases the risk of CAD. b, Schematic diagram of COL6A3 (UniProt ID: P12111). COL6A3 comprises a short collagenous region (gray line between N1 and C1) flanked by multiple von Willebrand factor type A modules, specifically N1–N10 in the N-terminal region and C1–C2 in the C-terminal region. In addition, COL6A3 contains three unique C-terminal domains (C3–C5) that are absent from other collagen type VI families. The most C-terminal domain, C5, is cleaved into soluble endotrophin. The two amino acid sequences of COL6A3 targeted by the aptamers to measure COL6A3 levels are as follows: the N-terminal-binding aptamer targets the amino acid sequence 26–1036 (uncleaved section), whereas the C-terminal aptamer targets the amino acid sequence 3108–3165 (cleaved section). The figure has been modified from ref. ,. c, MR analysis of the effects of C-terminal and N-terminal COL6A3 on the risk of CAD. d, MR for the effects of BMI and body fat percentage on COL6A3 stratified by C- and N-terminal COL6A3. e, MR for the effects of body fat compartments on COL6A3 stratified by C- and N-terminal COL6A3. We used MRI-derived GWAS on abdominal subcutaneous adipose tissue, visceral adipose tissue and gluteofemoral adipose tissue from 40,032 individuals in the UK Biobank, reported by Agrawal et al.. The two-sample MR method was as described in the step 1 MR analysis. In ce, P values were obtained using the random-effects IVW method (two-sided test). Error bars represent the 95% CI for effect estimates. a, created using BioRender. Source data
Fig. 6
Fig. 6. Epigenetic profile of the lead cis-pQTL for C-terminal COL6A3.
a, LocusZoom plot of the pQTL for C-terminal COL6A3 in the 1-Mb region surrounding the lead cis-pQTL from deCODE, rs11677932. The y axis on the left represents the −log10 P value from the two-sided Z test. The yellow shaded region (chr2:237305312–237325312; GRCh38) is enlarged in b. b, ATAC-seq (red), H3K4me3 ChIP–seq (blue) and H3K27ac ChIP–seq (green) data for adipose tissue, coronary artery, aorta, thoracic artery and tibial artery. These data are publicly available through ENCODE and RegulomeDB. c, rs11677932 is predicted to affect the binding of TF MEF2B. ENCODE accession ID: ENCSR782UOT; target: BORCS8-MEF2B, MEF2B.
Fig. 7
Fig. 7. Single-cell sequencing analyses of COL6A3.
a,b, COL6A3 expression patterns in the adipose tissues (a) and coronary arteries (b). UMAP plots are colored by COL6A3 expression (left) and cell type annotation (right). We obtained single-cell transcriptomic data for human adipose tissue from ref. (SCP1376 at https://singlecell.broadinstitute.org/) and data for coronary arteries from ref. (GSE131780 at the Gene Expression Omnibus database https://www.ncbi.nlm.nih.gov/geo/). ASPC, adipose stem and progenitor cells; LEC, lymphatic endothelial cells; NK, natural killer cells; DC, dendritic cells; HSC, hematopoietic stem cells.
Fig. 8
Fig. 8. Effects of fat mass and lean mass on proteins and cardiometabolic diseases.
a,b, We performed multivariable MR using fat mass (left) and lean mass (right) as exposures and plasma protein levels of the seven protein mediators (a) and cardiometabolic diseases (b) as outcomes. P values were obtained using the random-effects IVW method (two-sided test). Error bars represent the 95% CI for effect estimates. Source data
Extended Data Fig. 1
Extended Data Fig. 1. Filtering flowchart for step 1 MR.
(a) Flowchart for filtering proteins in step 1 MR. (b) A histogram showing the number of pQTL variants that were associated with more than one protein in cis and thus removed before Step 2 MR. Further details can be found in Methods. Source data
Extended Data Fig. 2
Extended Data Fig. 2. LocusZoom plot for cis-pQTL of COL6A3 and coronary artery disease.
(a) LocusZoom plot of the pQTL for C-terminal COL6A3 in the 500 kb-region surrounding the lead cis-pQTL from deCODE (rs11677932), which used SomaScan v4 assay. Y-axis on the left represents -log10(P-value) from the two-tailed Z test. (b) pQTL for C-terminal COL6A3 in the same region from the UK Biobank, which used Olink Explore 3072 assay. rs1050785 was the lead cis-pQTL in this pQTL, which was in LD (R2 = 0.73) with rs11677932. (c) Coronary artery disease GWAS from Aragam et al. in the same region.
Extended Data Fig. 3
Extended Data Fig. 3. Sex-stratified analyses of C-terminal COL6A3 (C-COL6A3).
(a) Sex-stratified GWAS of plasma levels of C-terminal COL6A3 (sex-stratified pQTL). Only variants with P < 0.05 (two-sided Z-test) are presented. The horizontal dashed and gray lines represent -log10(P) corresponding to P < 5 × 10−8 (genome-wide significance) and P < 1 × 10−5 (suggestive significance) respectively. (b) Scatter plots showing the sex-stratified two-sample MR results for the effect of BMI on plasma levels of COL6A3 using the inverse-variance weighted method (primary analysis; red regression line), weighted median (blue regression line), or MR-Egger slope methods (purple regression line). The P-values were obtained using the random-effects inverse variance weighted method (two-tailed test). The error bars represent the 95% CI for effect estimates. (c) Sex-stratified Step 1 MR results for the effect of BMI on C-terminal COL6A3 levels (left panel) and Step 2 MR results for the effect of C-terminal COL6A3 on CAD risk (right panel). The error bars represent the 95% CI for effect estimates. (d) Sex-stratified LocusZoom plots for cis-pQTL of C-terminal COL6A3 from the UK Biobank (left panel) and coronary artery disease GWAS from Aragam et al. (right panel). Y-axis on the left represents -log10(P-value) from the two-tailed Z test. C-COL6A3 = C-terminal COL6A3. Source data
Extended Data Fig. 4
Extended Data Fig. 4. COL6A3 expression profile in human tissues in GTEx v.8.
COL6A3 expression levels in 49 human tissues from GTEx v.8 were represented on a log transcript per 10 thousand plus one (TPM + 1) scale. Violin plots illustrate the distribution of expression levels, with boxes showing the interquartile range and horizontal lines indicating the median expression level. Whiskers represent the maximum and minimum values within 1.5 times the IQR from the first and third quartiles.

References

    1. Powell-Wiley, T. M. et al. Obesity and cardiovascular disease: a scientific statement from the American Heart Association. Circulation143, e984–e1010 (2021). - PMC - PubMed
    1. Czech, M. P. Insulin action and resistance in obesity and type 2 diabetes. Nat. Med.23, 804–814 (2017). - PMC - PubMed
    1. Zaghlool, S. B. et al. Revealing the role of the human blood plasma proteome in obesity using genetic drivers. Nat. Commun.12, 1279 (2021). - PMC - PubMed
    1. Goudswaard, L. J. et al. Effects of adiposity on the human plasma proteome: observational and Mendelian randomisation estimates. Int. J. Obes.45, 2221–2229 (2021). - PMC - PubMed
    1. Zheng, J. et al. Phenome-wide Mendelian randomization mapping the influence of the plasma proteome on complex diseases. Nat. Genet.52, 1122–1131 (2020). - PMC - PubMed