Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Multicenter Study
. 2017 Nov;72(11):998-1006.
doi: 10.1136/thoraxjnl-2016-209846. Epub 2017 Jun 21.

Do COPD subtypes really exist? COPD heterogeneity and clustering in 10 independent cohorts

Affiliations
Multicenter Study

Do COPD subtypes really exist? COPD heterogeneity and clustering in 10 independent cohorts

Peter J Castaldi et al. Thorax. 2017 Nov.

Abstract

Background: COPD is a heterogeneous disease, but there is little consensus on specific definitions for COPD subtypes. Unsupervised clustering offers the promise of 'unbiased' data-driven assessment of COPD heterogeneity. Multiple groups have identified COPD subtypes using cluster analysis, but there has been no systematic assessment of the reproducibility of these subtypes.

Objective: We performed clustering analyses across 10 cohorts in North America and Europe in order to assess the reproducibility of (1) correlation patterns of key COPD-related clinical characteristics and (2) clustering results.

Methods: We studied 17 146 individuals with COPD using identical methods and common COPD-related characteristics across cohorts (FEV1, FEV1/FVC, FVC, body mass index, Modified Medical Research Council score, asthma and cardiovascular comorbid disease). Correlation patterns between these clinical characteristics were assessed by principal components analysis (PCA). Cluster analysis was performed using k-medoids and hierarchical clustering, and concordance of clustering solutions was quantified with normalised mutual information (NMI), a metric that ranges from 0 to 1 with higher values indicating greater concordance.

Results: The reproducibility of COPD clustering subtypes across studies was modest (median NMI range 0.17-0.43). For methods that excluded individuals that did not clearly belong to any cluster, agreement was better but still suboptimal (median NMI range 0.32-0.60). Continuous representations of COPD clinical characteristics derived from PCA were much more consistent across studies.

Conclusions: Identical clustering analyses across multiple COPD cohorts showed modest reproducibility. COPD heterogeneity is better characterised by continuous disease traits coexisting in varying degrees within the same individual, rather than by mutually exclusive COPD subtypes.

Keywords: COPD epidemiology.

PubMed Disclaimer

Conflict of interest statement

Competing interests: Over the past 3 years, PJC has received research support and consulting fees from GSK. Other authors have no competing interests to declare.

Figures

Figure 1
Figure 1
Overview of Cluster Generation, Transfer, and Concordance Assessment. For each cohort, 23 “source” clustering solutions (S1 to S23) are generated (total of 230 solutions across the 10 cohorts). Each solution is transferred to the other cohorts via a predictive model (T1 to T23). Each solution is also labeled according to its parent cohort, thus source solution 1 from cohort 1 = S1C1. Each cohort ultimately produces 230 cluster solutions (23 source solutions, and 207 transferred solutions which are “predicted into” each cohort). The green, red, and dark blue colors correspond to cluster results generated by a specific cluster method and set of parameters (for example, “k-medoids with k=2”).
Figure 2
Figure 2
Loadings of input features (cluster variables) for the first four principal components in all cohorts.
Figure 3
Figure 3
Heatmap of Relative Feature Importance for Clustering by Cohort. Colors represent importance values generated by unsupervised random forests clustering. Higher values indicate that a given feature had a larger impact on the clustering results than other features in that dataset. Results for primary analysis in all ten cohorts are shown in Panel A. Results for the COPDGene and ECLIPSE substudy with more clustering features are shown in Panel B.
Figure 4
Figure 4
Reproducibility of Different Clustering Methods Across Ten Cohorts. Distribution of normalized mutual information (NMI*) is shown for clustering with partitioning around medoids (PAM, in blue), hierarchical clustering including unclassified subjects (HC + U, in green), and hierarchical clustering excluding unclassified subjects (HC, in red). * NMI ranges from 0 (poor reproducibility) to 1 (excellent reproducibility).
Figure 5
Figure 5
PCA Plot of Clustering Variables Used in COPDGene k-means Clustering. Visualization of data by the first three principal components in the COPDGene clustering analysis with spirometric, chest CT imaging, and clinical data.

References

    1. Vestbo J, Sin DD, Hurd SS, et al. Global strategy for the diagnosis, management, and prevention of chronic obstructive pulmonary disease: GOLD executive summary. American Journal of Respiratory and Critical Care Medicine. 2013;187:347–65. doi: 10.1164/rccm.201204-0596PP. - DOI - PubMed
    1. Rennard SI, Vestbo J. The Many “Small COPDs. Chest. 2008;134:623. doi: 10.1378/chest.07-3059. - DOI - PubMed
    1. Cho M, Washko GR, Hoffmann TJ, et al. Cluster analysis in severe emphysema subjects using phenotype and genotype data: an exploratory investigation. Respiratory Research. 2010;11:30. doi: 10.1186/1465-9921-11-30. - DOI - PMC - PubMed
    1. Burgel P-R, Paillasseur J-L, Caillaud D, et al. Clinical COPD phenotypes: a novel approach using principal component and cluster analyses. The European respiratory journal : official journal of the European Society for Clinical Respiratory Physiology. 2010;36:531–9. doi: 10.1183/09031936.00175109. - DOI - PubMed
    1. Burgel P-R, Paillasseur J-L, Roche N. Identification of clinical phenotypes using cluster analyses in COPD patients with multiple comorbidities. Biomed Res Int. 2014;2014:420134–9. doi: 10.1155/2014/420134. - DOI - PMC - PubMed

Publication types

MeSH terms