Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2023 Feb 21:2023.02.20.23286186.
doi: 10.1101/2023.02.20.23286186.

Identifying COPD subtypes using multi-trait genetics

Affiliations

Identifying COPD subtypes using multi-trait genetics

Andrey Ziyatdinov et al. medRxiv. .

Update in

Abstract

Chronic Obstructive Pulmonary Disease (COPD) has a simple physiological diagnostic criterion but a wide range of clinical characteristics. The mechanisms underlying this variability in COPD phenotypes are unclear. To investigate the potential contribution of genetic variants to phenotypic heterogeneity, we examined the association of genome-wide associated lung function, COPD, and asthma variants with other phenotypes using phenome-wide association results derived in the UK Biobank. Our clustering analysis of the variants-phenotypes association matrix identified three clusters of genetic variants with different effects on white blood cell counts, height, and body mass index (BMI). To assess the potential clinical and molecular effects of these groups of variants, we investigated the association between cluster-specific genetic risk scores and phenotypes in the COPDGene cohort. We observed differences in steroid use, BMI, lymphocyte counts, chronic bronchitis, and differential gene and protein expression across the three genetic risk scores. Our results suggest that multi-phenotype analysis of obstructive lung disease-related risk variants may identify genetically driven phenotypic patterns in COPD.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.. Analysis pipeline for COPD subtypes inference using multi-trait genetic approach.
At step 1, the multitrait Z-score matrix derived from UK Biobank data is thinned to a smaller set of traits driven by pre-selected variants. Each selected trait is further split into two meta-traits with either positive or negative Z-scores. At step 2, the matrix of processed Z-scores is decomposed into products of two weight matrices W and H, with the number of columns K being equal to the number of clusters. Finally at step 3, the weight matrix for variants, H, is used to build K weighted GRSs at the individual level data from validation cohort (COPDGene).
Figure 2.
Figure 2.. Distribution of trait weights across the three inferred clusters.
We selected the top 15 traits with the largest contribution to the cluster inference. The Nonnegative Matrix Factorization (NMF) clustering constructs two matrices W and H out of the Z-score association matrix, so that Z ≈ WHT, where H is a matrix of traits weight with number of columns equals to the number of clusters. The top traits corresponded to those harboring normalized weights (unit sum of column elements) larger than 3% for at least one cluster. The figure represents the weights for each trait and each of the three inferred clusters. Red bars correspond to the contribution of positive Z-scores submatrix, and blue bars to negative Z-scores submatrix.
Figure 3.
Figure 3.. Marginal effects of GRSs on selected traits in the validation COPDGene dataset.
Point estimates and 95% confidence intervals obtained from marginal models are displayed for cluster-specific GRSs (GRS1…3; from dark blue to pink) and unweighted GRS (GRS0; black). For comparison purposes, all GRS were re-scaled to a unit variance. We selected traits representing different COPD phenotypic groups: height, weight, forced expiratory flow at 25–75% of force vital capacity (FEF25–75), visual emphysema score (Emphysema), eosinophils count (Eosinophils), steroids treatment (Steroids), upper third/lower third emphysema ratio (Emphysema ratio), diffusing capacity for carbon monoxide (DLCO), coronary artery disease (CAD).
Figure 4.
Figure 4.. Contribution of cluster-specific GRSs
Out of 240 COPDgene phenotypes tested for association with genetic risk scores, a total of 47 phenotypes showed a statistically significant (Phet) improvement of model fit at an FDR of 0.1 when comparing the marginal GRS0 model against a full model including GRS0 and all GRS1–3. The barplots represent the relative contribution of GRS1, GRS2, and GRS3, measured as Zscore derived from the full model, for these 47 phenotypes, highlighting which of the three GRS convey the improved fit. Phenotypes are order by Phet. Red dash lines indicate the stringent Bonferoni significance threshold accounting for a total of 723 tests.

References

    1. Sakornsakolpat P. et al. Genetic landscape of chronic obstructive pulmonary disease identifies heterogeneous cell-type and phenotype associations. Nat Genet 51, 494–505 (2019). - PMC - PubMed
    1. Castaldi P.J. et al. Machine Learning Characterization of COPD Subtypes: Insights From the COPDGene Study. Chest 157, 1147–1157 (2020). - PMC - PubMed
    1. Rennard S.I. & Vestbo J. The many “small COPDs”: COPD should be an orphan disease. Chest 134, 623–627 (2008). - PubMed
    1. Aguirre M. et al. Polygenic risk modeling with latent trait-related genetic components. Eur J Hum Genet 29, 1071–1081 (2021). - PMC - PubMed
    1. Udler M.S. et al. Type 2 diabetes genetic loci informed by multi-trait associations point to disease mechanisms and subtypes: A soft clustering analysis. PLoS Med 15, e1002654 (2018). - PMC - PubMed

Publication types