Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Aug 13;21(8):e1004444.
doi: 10.1371/journal.pmed.1004444. eCollection 2024 Aug.

Identification of factors directly linked to incident chronic obstructive pulmonary disease: A causal graph modeling study

Affiliations

Identification of factors directly linked to incident chronic obstructive pulmonary disease: A causal graph modeling study

Robert W Gregg et al. PLoS Med. .

Abstract

Background: Beyond exposure to cigarette smoking and aging, the factors that influence lung function decline to incident chronic obstructive pulmonary disease (COPD) remain unclear. Advancements have been made in categorizing COPD into emphysema and airway predominant disease subtypes; however, predicting which healthy individuals will progress to COPD is difficult because they can exhibit profoundly different disease trajectories despite similar initial risk factors. This study aimed to identify clinical, genetic, and radiological features that are directly linked-and subsequently predict-abnormal lung function.

Methods and findings: We employed graph modeling on 2,643 COPDGene participants (aged 45 to 80 years, 51.25% female, 35.1% African Americans; enrollment 11/2007-4/2011) with smoking history but normal spirometry at study enrollment to identify variables that are directly linked to future lung function abnormalities. We developed logistic regression and random forest predictive models for distinguishing individuals who maintain lung function from those who decline. Of the 131 variables analyzed, 6 were identified as informative to future lung function abnormalities, namely forced expiratory flow in the middle range (FEF25-75%), average lung wall thickness in a 10 mm radius (Pi10), severe emphysema, age, sex, and height. We investigated whether these features predict individuals leaving GOLD 0 status (normal spirometry according to Global Initiative for Obstructive Lung Disease (GOLD) criteria). Linear models, trained with these features, were quite predictive (area under receiver operator characteristic curve or AUROC = 0.75). Random forest predictors performed similarly to logistic regression (AUROC = 0.7), indicating that no significant nonlinear effects were present. The results were externally validated on 150 participants from Specialized Center for Clinically Oriented Research (SCCOR) cohort (aged 45 to 80 years, 52.7% female, 4.7% African Americans; enrollment: 7/2007-12/2012) (AUROC = 0.89). The main limitation of longitudinal studies with 5- and 10-year follow-up is the introduction of mortality bias that disproportionately affects the more severe cases. However, our study focused on spirometrically normal individuals, who have a lower mortality rate. Another limitation is the use of strict criteria to define spirometrically normal individuals, which was unavoidable when studying factors associated with changes in normalized forced expiratory volume in 1 s (FEV1%predicted) or the ratio of FEV1/FVC (forced vital capacity).

Conclusions: This study took an agnostic approach to identify which baseline measurements differentiate and predict the early stages of lung function decline in individuals with previous smoking history. Our analysis suggests that emphysema affects obstruction onset, while airway predominant pathology may play a more important role in future FEV1 (%predicted) decline without obstruction, and FEF25-75% may affect both.

PubMed Disclaimer

Conflict of interest statement

RWG, CMK, and PVB have no competing interests. FCS has received grant support and consulting fees from Sanofi/Regeneron, AstraZeneca, Verona Pharma, Nuvaira, Gala Therapeutics, GlaxoSmithKline, Boehringer Ingelheim. EKS has received grant support from Bayer and Northpond Laboratories. DLD has received grant support from Bayer and the Alpha-1 Foundation.

Figures

Fig 1
Fig 1. Overview of the methods used to predict GOLD 0 stage progression.
(A) Shows the COPD disease axes used to categorize individuals into different GOLD stages. (B) Depicts the transition in GOLD stage between the first and second COPDGene visit (a 5-year difference on average); 79.6% of individuals maintain GOLD 0 status and the remaining progress to more severe disease. (C) Details the workflow for data processing. Combined COPDGene data sets include demographic, spirometric, radiological, genetic, and survey information. Baseline variables only include measurements from the first visit. Imputation was performed using K-nearest neighbors. Categorical variables with more than 3 levels were merged. Correlation was determined using coefficient of determination for continuous variables and Cramer’s V-score for categorical variables. (D) Lays out the process for parameter tuning. Visit 1 data is split into a testing and training set, the latter is further divided for k-fold cross validation. For each fold and choice of ⍺ (graph sparsity), FCImax is used to determine variables causally linked to COPD progression (i.e., the Markov blanket). These variables are then used as predictors in a classification model where ⍺ is chosen by maximizing AUROC. AUROC, area under receiver operator characteristic curve; FEV1, forced expiratory volume in 1 s; FVC, forced vital capacity; GOLD, Global Initiative for Obstructive Lung Disease.
Fig 2
Fig 2. Discovered Markov blanket that maximizes AUROC.
The Markov blanket encompasses every variable within the data set that can be used to infer information about our target variable: change in GOLD0 status between the first and second visit (ΔGOLD0). Each node in the graph corresponds to a measured variable in COPDGene and each edge represents a possible causal relationship that satisfies every conditional independence test performed. Arrows between variables show a direct causal link and/or an unmeasured latent confounder that causes both variables. (A) Depicts the optimal Markov blanket for the data set with variables related to FEV1 and FVC removed. (B) Shows the same model, but all spirometric variables are removed. CT, computed tomography; FEF25-75%, forced expiratory flow in the middle range; GOLD, Global Initiative for Obstructive Lung Disease; Perc15, 15th percentile cut-off for CT lung density in Hounsfield units; Pi10, average lung wall thickness in 10 mm radius.
Fig 3
Fig 3. Markov blanket composition across different graph sparsities.
The horizontal axis shows the log-transformed ⍺ value that controls graph sparsity (larger values lead to denser graphs), and the vertical axis shows variables that appeared in a Markov blanket. The color gradient counts the number of cross validating folds a variable appeared in the Markov blanket at that given graph sparsity. A lighter color indicates a variable appeared more frequently in the Markov blanket. (A) Shows the Markov blanket composition for the model with some spirometry measurements included. (B) Contains variables from the Markov blanket with no spirometry measurements included (showing the top 20, including ties). BMI, body mass index; CT, computed tomography; FEF25-75%, forced expiratory flow in the middle range; Perc15: 15th percentile cut-off for CT lung density in Hounsfield units; Pi10: average lung wall thickness in 10 mm radius; SGRQ, St George Respiratory Questionnaire.
Fig 4
Fig 4. Classifier performance for predicting change in GOLD 0 status.
Receiver operator characteristic curves (ROC) illustrate model performance for FCImax + Logistic Regression and random forest. The shaded areas in the training data set designate ±1 standard deviation based on 10× cross-validations. Area under the curve measurements are displayed in the legend (no significant difference). (A–C) Include “limited spirometry” models. (D–F) Include “no spirometry” models. AUROC, area under the receiver operator characteristic curve; GOLD, Global Initiative for Obstructive Lung Disease.
Fig 5
Fig 5. Classifier performance for predicting change in GOLD 0 status in the SCCOR cohort.
Plotted are the ROC curves for COPDGene test data set and the external validation SCCOR cohort. The final model predicts change in GOLD 0 status without Pi10 as a predictor in the limited spirometry model. Each legend entry provides the area under the receiver operator curve (AUROC). AUROC, area under the receiver operator characteristic curve; SCCOR, Specialized Center for Clinically Oriented Research.
Fig 6
Fig 6. Exploring variables importance to the random forest model (“limited spirometry” model).
(A) Variables are ordered by importance using mean absolute Shapley values. (B–D) The distribution of Shapley values across measured demographics (age, biological sex, and race). Positive Shapley values on vertical axes indicate the random forest model was more likely to predict that individual to leave the GOLD 0 status (and vice versa). In (B) we do not display 8 (out of 2,114) individuals that were <45 years old at baseline, since they did not match the inclusion criteria of COPDGene. (E) SNP contributions to the random forest model prediction. Colors differentiate individuals with and without a given SNP. CT, computed tomography; FEF25-75%, forced expiratory flow in the middle range; GOLD, Global Initiative for Obstructive Lung Disease; Pi10, average lung wall thickness in 10 mm radius; SGRQ, St George Respiratory Questionnaire; SNP, single-nucleotide polymorphism.

References

    1. Chronic obstructive pulmonary disease (COPD). Available from: https://www.who.int/news-room/fact-sheets/detail/chronic-obstructive-pul.... Accessed 2023 Jan 6.
    1. Pauwels RA, Buist AS, Calverley PMA, Jenkins CR, Hurd SS. Global strategy for the diagnosis, management, and prevention of chronic obstructive pulmonary disease. NHLBI/WHO Global Initiative for Chronic Obstructive Lung Disease (GOLD) Workshop summary. Am J Respir Crit Care Med. 2001;163:1256–1276. doi: 10.1164/ajrccm.163.5.2101039 - DOI - PubMed
    1. Vestbo J, Edwards LD, Scanlon PD, Yates JC, Agusti A, Bakke P, et al.. Changes in Forced Expiratory Volume in 1 Second over Time in COPD. N Engl J Med. 2011;365:1184–1192. doi: 10.1056/NEJMoa1105482 - DOI - PubMed
    1. Rennard SI, Vestbo J. The many “small COPDs”: COPD should be an orphan disease. Chest. 2008;134:623–627. doi: 10.1378/chest.07-3059 - DOI - PubMed
    1. Lee JH, Cho MH, McDonald MLN, Hersh CP, Castaldi PJ, Crapo JD, et al.. Phenotypic and genetic heterogeneity among subjects with mild airflow obstruction in COPDGene. Respir Med. 2014;108:1469–1480. doi: 10.1016/j.rmed.2014.07.018 - DOI - PMC - PubMed