This is a preprint.
Machine Learning Analysis of Electronic Health Records Identifies Interstitial Lung Disease and Predicts Mortality in Patients with Systemic Sclerosis
- PMID: 40502596
- PMCID: PMC12155007
- DOI: 10.1101/2025.06.02.25328786
Machine Learning Analysis of Electronic Health Records Identifies Interstitial Lung Disease and Predicts Mortality in Patients with Systemic Sclerosis
Abstract
Background: Interstitial lung disease (ILD) is the leading cause of death in patients with systemic sclerosis (SSc), affecting more than 40% of this population. Despite the availability of effective treatments to stabilize or improve lung function, survival for patients with SSc-ILD remains poor. Poor outcomes have been attributed to delayed diagnosis and initiation of treatment for SSc-ILD. Although recent guidelines have provided conditional recommendations for early screening, pulmonary function tests (PFTs) are insensitive for early diagnosis, and computed tomography (CT)-the current gold standard-often detects disease after irreversible lung injury has occurred. A single sensitive biomarker that can accurately predict the risk of SSc-ILD development and mortality is lacking. We hypothesized that applying machine learning (ML) methods to multiple features from readily available electronic health records (EHR) could construct a model to detect ILD and predict mortality in patients with SSc.
Methods: We retrospectively analyzed EHR data from participants enrolled in a single-center registry of patients with SSc over a period of twenty-eight years (1995-2024). We applied a combination of ML models to seventy-four clinical features encompassing demographics, clinical history, PFTs, and laboratory results. The resultant models were tasked with detecting ILD and predicting mortality in participants with SSc.
Results: 1,169 participants with SSc were included in this study, spanning 15,494 person-years of observation. Models detecting ILD achieved an AUC of 0.818 and confirmed the importance of known biomarkers, such as autoantibodies and PFTs, as risk factors for SSc-ILD. Unexpected clinical values including white blood cell count and mean corpuscular volume were also important for model prediction of SSc-ILD. For prediction of one-year all-cause mortality, models reached an AUC of 0.903. In a subgroup analysis of those with prevalenet radiographic SSc-ILD, three-year all-cause mortality prediction reached an AUC of 0.831. These models identified features strongly associated with mortality that are routinely collected during clinical assessment of patients with SSc, including unexpected associations with values such as red cell distribution width and serum chloride concentration.
Conclusions: ML-based analysis of clinical features and laboratory tests collected as part of routine clinical care detect ILD and predict mortality in patients with SSc.
Figures
References
-
- Elhai M. et al. Mapping and predicting mortality from systemic sclerosis. Ann. Rheum. Dis. 76, 1897–1905 (2017). - PubMed
-
- Tyndall A. J. et al. Causes and risk factors for death in systemic sclerosis: a study from the EULAR Scleroderma Trials and Research (EUSTAR) database. Ann. Rheum. Dis. 69, 1809–1815 (2010). - PubMed
Publication types
Grants and funding
- U19 AI181102/AI/NIAID NIH HHS/United States
- R21 AG075423/AG/NIA NIH HHS/United States
- K23 HL169815/HL/NHLBI NIH HHS/United States
- P01 AG049665/AG/NIA NIH HHS/United States
- U19 AI135964/AI/NIAID NIH HHS/United States
- R01 HL158139/HL/NHLBI NIH HHS/United States
- P01 HL154998/HL/NHLBI NIH HHS/United States
- U01 TR003528/TR/NCATS NIH HHS/United States
- R01 HL147575/HL/NHLBI NIH HHS/United States
- R01 HL149883/HL/NHLBI NIH HHS/United States
- I01 CX001777/CX/CSRD VA/United States
- R01 ES034350/ES/NIEHS NIH HHS/United States
- R01 HL153312/HL/NHLBI NIH HHS/United States
- U54 AG079754/AG/NIA NIH HHS/United States
- R01 HL147290/HL/NHLBI NIH HHS/United States
- R01 AI158530/AI/NIAID NIH HHS/United States
- L30 HL149048/HL/NHLBI NIH HHS/United States
LinkOut - more resources
Full Text Sources