Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Nov;78(11):1067-1079.
doi: 10.1136/thorax-2022-219158. Epub 2023 Jun 2.

Pulmonary emphysema subtypes defined by unsupervised machine learning on CT scans

Affiliations

Pulmonary emphysema subtypes defined by unsupervised machine learning on CT scans

Elsa D Angelini et al. Thorax. 2023 Nov.

Abstract

Background: Treatment and preventative advances for chronic obstructive pulmonary disease (COPD) have been slow due, in part, to limited subphenotypes. We tested if unsupervised machine learning on CT images would discover CT emphysema subtypes with distinct characteristics, prognoses and genetic associations.

Methods: New CT emphysema subtypes were identified by unsupervised machine learning on only the texture and location of emphysematous regions on CT scans from 2853 participants in the Subpopulations and Intermediate Outcome Measures in COPD Study (SPIROMICS), a COPD case-control study, followed by data reduction. Subtypes were compared with symptoms and physiology among 2949 participants in the population-based Multi-Ethnic Study of Atherosclerosis (MESA) Lung Study and with prognosis among 6658 MESA participants. Associations with genome-wide single-nucleotide-polymorphisms were examined.

Results: The algorithm discovered six reproducible (interlearner intraclass correlation coefficient, 0.91-1.00) CT emphysema subtypes. The most common subtype in SPIROMICS, the combined bronchitis-apical subtype, was associated with chronic bronchitis, accelerated lung function decline, hospitalisations, deaths, incident airflow limitation and a gene variant near DRD1, which is implicated in mucin hypersecretion (p=1.1 ×10-8). The second, the diffuse subtype was associated with lower weight, respiratory hospitalisations and deaths, and incident airflow limitation. The third was associated with age only. The fourth and fifth visually resembled combined pulmonary fibrosis emphysema and had distinct symptoms, physiology, prognosis and genetic associations. The sixth visually resembled vanishing lung syndrome.

Conclusion: Large-scale unsupervised machine learning on CT scans defined six reproducible, familiar CT emphysema subtypes that suggest paths to specific diagnosis and personalised therapies in COPD and pre-COPD.

Keywords: COPD epidemiology; Emphysema; Imaging/CT MRI etc.

PubMed Disclaimer

Conflict of interest statement

Competing interests: EDA, PPB, AM, YS, WS, JHMA, MHC, DC, EH, DRJ, SK, JDK, TL, JL, ECO, WP, MRP, SSR, EKS, KEW and AFL reports receiving grants from the National Institutes of Health (NIH). JY performed the work at Columbia University but is now an employee of Google. EAH reports receiving grants from the NIH; being a founder and shareholder of VIDA Diagnostics; and holding patents for an apparatus for analysing CT images to determine the presence of pulmonary tissue pathology, an apparatus for image display and analysis, and a method for multiscale meshing of branching biological structures. EBA reports receiving grants from the American Heart Association and the NIH. CBC reports receiving personal fees from GlaxoSmithKline. MTD reports receiving a grant from the NHLBI and personal fees from AstraZeneca, GlaxoSmithKline, Pulmonx, PneumRx/BTG and Quark. MKH reports consulting for GlaxoSmithKline, AstraZeneca and Boehringer Ingelheim receiving research support from Novartis and Sunovion. NNH reports receiving grants from the NIH, Boehringer Ingelheim, and the COPD Foundation. JDK reports receiving grants from US Environmental Protection Agency and the NIH. FJM reports serving on COPD advisory boards for AstraZeneca, Boehringer Ingelheim, Chiesi, GlaxoSmithKline, Sunovion and Teva; serving as a consultant for ProterixBio and Verona; serving on the steering committees of studies sponsored by the NHLBI, AstraZeneca, and GlaxoSmithKline; having served on data safety and monitoring boards of COPD studies supported by Genentech and GlaxoSmithKline. BMS reports receiving grants from the NIH, Canadian Institutes of Health Research (CIHR), Fonds de la recherche en santé du Québec (FRQS), the Research Institute of the McGill University Health Centre, the Quebec Lung Association and AstraZeneca. PGW reports receiving personal fees for consultancy from Theravance, AstraZeneca, Regeneron, Sanofi, Genentech, Roche and Janssen. RGB reports receiving grants from the COPD Foundation, the US Environmental Protection Agency (EPA), the American Lung Association and the NIH.

Figures

Figure 1.
Figure 1.
Schema of unsupervised machine learning, data reduction, primary descriptive analyses, events analyses and GWAS. Unsupervised machine learning of possible emphysema subtypes was performed in twoindependent training sets in SPIROMICS. Both training sets yielded 10 possible emphysema subtypes, and training was repeated on all of SPIROMICS. The resultant 10 possible emphysema subtypes were labelled on MESA Lung CT scans. Data reduction was performed in SPIROMCS and MESA Lung and yielded six CT emphysema subtypes; data reduction was confirmed longitudinally on coregistered CT scans in a subset of the MESA Lung Study oversampled for COPD and smoking. Primary descriptive analyses of these subtypes were performed in the MESA Lung Study. Cardiac scans in MESA were labelled for the Event Analyses in MESA. GWAS Discovery was performed in SPIROMICS; replication of genetic results occurred on labelled cardiac scans in MESA and MESA SHARe and in COPDGene.
Figure 2.
Figure 2.
Representative visual illustrations of the six CT emphysema subtypes. Coronal views of lungs on CT scans and the corresponding labelled masks with the discovered CT emphysema subtypes on predominantly affected sample cases (i.e. with proportion of a certain CT emphysema subtype being much larger than any other). Color coding of CT emphysema subtypes is the same across examples; grey labelling denotes non-emphysematous regions. Abbreviation: CPFE=combined pulmonary fibrosis/emphysema
Figure 3.
Figure 3.
Distributions of the six discovered CT emphysema subtypes in SPIROMICS and the MESA Lung Study. Mean percentages of CT emphysema subtypes in SPIROMICS, a COPD case-control study of 2655 participants with 20 or more packyears of smoking (median packyears 43.0; 66.2% with COPD) and 198 non-smoking controls, and in the MESA Lung Study, a population-based study of 2,949 participants, 54.2% of whom had ever smoked cigarettes (median packyears 14.5) and 16.9% with COPD. Abbreviations: CPFE=combined pulmonary fibrosis/emphysema, COPD=chronic obstructive pulmonary disease
Figure 4:
Figure 4:
Multivariable associations of CT emphysema subtypes with symptoms, physiology, lung structure, and lung function decline in the MESA Lung Study.* β estimates for continuous outcomes show the effect size per 10% increment in CT emphysema subtype, except for percent emphysema, which is per 1% increment in CT emphysema subtype. The β estimates for chronic bronchitis and interstitial lung abnormalities are the log(odds ratios). All results adjusted for age, sex, race/ethnicity, height, weight, smoking status, pack-years, scanner manufacturer and other CT emphysema subtypes. Abbreviations: CPFE=combined pulmonary fibrosis/emphysema; FEV1= Forced expiratory volume in one second; FVC=Forced expiratory volume in one second.
Figure 5.
Figure 5.
Manhattan and local association plots for the three genome-wide significant, replicated gene variants for three CT emphysema subtypes in SPIROMICS. The red lines show the level of statistical significance (P = 5×10–8). The genome-wide significant SNP for the Combined Bronchitis-Apical Emphysema subtype replicated among Whites (P=0.01) and the entire replication sample (P=0.04). The genome-wide significant SNP for the restrictive CPFE subtype replicated among Whites (P=0.01) and the entire replication sample (P=0.04). The first genome-wide significant SNP for the obstructive CPFE subtype on chromosome 19 had variance only among Black participants and replicated in this sample (P=0.046). The second genome-wide significant SNP for the obstructive CPFE subtype on chromosome 16 did not replicate. There were no significant replicated genetic associations for the diffuse and senile CT emphysema subtypes (not shown).* Results are shown for the lowest attenuation (most severe) of the three preliminary subtypes that comprise the Combined Bronchitis-Apical Emphysema subtype. Abbreviation: CPFE=combined pulmonary fibrosis/emphysema

Comment in

References

    1. World Health Organization. The top 10 causes of death, 2019. Geneva, Switzerland: WHO; 2020 [Available from: https://www.who.int/news-room/fact-sheets/detail/the-top-10-causes-of-death accessed 7-7-21.
    1. Shrine N, Guyatt AL, Erzurumluoglu AM, et al. New genetic signals for lung function highlight pathways and chronic obstructive pulmonary disease associations across multiple ancestries. Nat Genet 2019;51(3):481–93. - PMC - PubMed
    1. Global Stategy for the Diagnosis, Management, and Prevention of Chronic Obstructive Pulmonary Disease - 2023 Report: Global Initiative for Chronic Obstructive Lung Disease, 2022.
    1. Baldwin ED, Cournand A, Richards DW Jr. Pulmonary insufficiency; a study of 122 cases of chronic pulmonary emphysema. Medicine 1949;28:201–37. - PubMed
    1. Burrows B, Fletcher CM, Heard BE, et al. The emphysematous and bronchial types of chronic airways obstruction. A clinicopathological study of patients in London and Chicago. Lancet 1966;287:830–35. - PubMed

Publication types

Grants and funding