Machine Learning of Plasma Proteomics Classifies Diagnosis of Interstitial Lung Disease
- PMID: 38422478
- PMCID: PMC11351805
- DOI: 10.1164/rccm.202309-1692OC
Machine Learning of Plasma Proteomics Classifies Diagnosis of Interstitial Lung Disease
Abstract
Rationale: Distinguishing connective tissue disease-associated interstitial lung disease (CTD-ILD) from idiopathic pulmonary fibrosis (IPF) can be clinically challenging. Objectives: To identify proteins that separate and classify patients with CTD-ILD and those with IPF. Methods: Four registries with 1,247 patients with IPF and 352 patients with CTD-ILD were included in analyses. Plasma samples were subjected to high-throughput proteomics assays. Protein features were prioritized using recursive feature elimination to construct a proteomic classifier. Multiple machine learning models, including support vector machine, LASSO (least absolute shrinkage and selection operator) regression, random forest, and imbalanced Random Forest, were trained and tested in independent cohorts. The validated models were used to classify each case iteratively in external datasets. Measurements and Main Results: A classifier with 37 proteins (proteomic classifier 37 [PC37]) was enriched in the biological process of bronchiole development and smooth muscle proliferation and immune responses. Four machine learning models used PC37 with sex and age score to generate continuous classification values. Receiver operating characteristic curve analyses of these scores demonstrated consistent areas under the curve of 0.85-0.90 in the test cohort and 0.94-0.96 in the single-sample dataset. Binary classification demonstrated 78.6-80.4% sensitivity and 76-84.4% specificity in the test cohort and 93.5-96.1% sensitivity and 69.5-77.6% specificity in the single-sample classification dataset. Composite analysis of all machine learning models confirmed 78.2% (194 of 248) accuracy in the test cohort and 82.9% (208 of 251) in the single-sample classification dataset. Conclusions: Multiple machine learning models trained with large cohort proteomic datasets consistently distinguished CTD-ILD from IPF. Many of the identified proteins are involved in immune pathways. We further developed a novel approach for single-sample classification, which could facilitate honing the differential diagnosis of ILD in challenging cases and improve clinical decision making.
Keywords: connective tissue disease with ILD; differential diagnosis; idiopathic pulmonary fibrosis; machine learning model; plasma proteomics.
Figures




Comment in
-
The Analysis of Proteomics by Machine Learning in Separating Idiopathic Pulmonary Fibrosis from Connective Tissue Disease-Interstitial Lung Disease.Am J Respir Crit Care Med. 2024 Aug 15;210(4):378-380. doi: 10.1164/rccm.202403-0603ED. Am J Respir Crit Care Med. 2024. PMID: 38593003 Free PMC article. No abstract available.
References
-
- Raghu G, Remy-Jardin M, Myers JL, Richeldi L, Ryerson CJ, Lederer DJ, et al. American Thoracic Society, European Respiratory Society, Japanese Respiratory Society, and Latin American Thoracic Society Diagnosis of idiopathic pulmonary fibrosis: an official ATS/ERS/JRS/ALAT clinical practice guideline. Am J Respir Crit Care Med . 2018;198:e44–e68. - PubMed
-
- Grewal JS, Morisset J, Fisher JH, Churg AM, Bilawich AM, Ellis J, et al. Role of a regional multidisciplinary conference in the diagnosis of interstitial lung disease. Ann Am Thorac Soc . 2019;16:455–462. - PubMed
MeSH terms
Substances
Grants and funding
- T32 HL007605/HL/NHLBI NIH HHS/United States
- UG3HL145266/HL/NHLBI NIH HHS/United States
- K23HL150301/HL/NHLBI NIH HHS/United States
- R01 HL166290/HL/NHLBI NIH HHS/United States
- R01 HL130796/HL/NHLBI NIH HHS/United States
- R01HL130796/HL/NHLBI NIH HHS/United States
- R01 HL169166/HL/NHLBI NIH HHS/United States
- UG3 HL145266/HL/NHLBI NIH HHS/United States
- K23HL146942/HL/NHLBI NIH HHS/United States
- T32HL007605/HL/NHLBI NIH HHS/United States
- K23HL138190/HL/NHLBI NIH HHS/United States
- K23 HL146942/HL/NHLBI NIH HHS/United States
- K23 HL138190/HL/NHLBI NIH HHS/United States
- K23 HL150301/HL/NHLBI NIH HHS/United States
LinkOut - more resources
Full Text Sources
Medical