Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Sep 24;15(1):32744.
doi: 10.1038/s41598-025-14893-1.

A machine learning approach to identify patients at risk for long-term consequences after pulmonary embolism

Collaborators, Affiliations

A machine learning approach to identify patients at risk for long-term consequences after pulmonary embolism

Stephan Nopp et al. Sci Rep. .

Abstract

Pulmonary embolism (PE) can result in long-term sequelae, such as post-PE syndrome, including persistent dyspnea and chronic thromboembolic pulmonary hypertension (CTEPH). Existing prediction tools for severe post-PE complications lack sensitivity and specificity. This study aimed to develop a machine learning model to identify patients at risk for long-term consequences after PE. Using data from the RIETE registry, the largest prospective international PE registry, we developed supervised machine learning models to identify patients at increased risk of CTEPH and post-PE syndrome. Our approach involved data preprocessing, model training via random forest algorithm, and validation through Monte-Carlo cross-validation. The performance of the CTEPH prediction model was benchmarked against an existing score. Of the 57,981 PE patients in the RIETE registry, 5,217 were eligible for inclusion. Median age was 68 years, with 50.6% men. Machine learning was based on 111 predictor variables, with 171 patients (3.3%) developing CTEPH. The CTEPH model demonstrated good performance with an AUC of 0.74 (95%CI: 0.73-0.75), significantly outperforming the existing CTEPH prediction score (0.57; 0.54-0.61). Additionally, 1,310 (25.1%) patients were defined as having post-PE syndrome six months after index PE. The post-PE syndrome model showed poorer performance with an AUC of 0.62 (0.61-0.62). Key predictor variables across both models included chest pain at presentation, PE location, troponin, side of clot, and dyspnea at presentation. Machine learning models show promise in predicting CTEPH but are less effective for post-PE syndrome. Future refinement, including integrating imaging data, is necessary to improve predictive performance and clinical utility.

Keywords: Dyspnea; Machine learning; Prediction; Pulmonary arterial hypertension; Pulmonary embolism; Venous thromboembolism.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Study cohort flow chart.
Fig. 2
Fig. 2
Machine learning workflow. Machine learning workflow consisting of three major steps, preprocessing, model training and model validation. Data was split into cross-validation folds before fold-wise preprocessing, training of a random forest classifier, performance assessment and subsequent accumulation and averaging of performance results.
Fig. 3
Fig. 3
Machine learning performances for CTEPH and relative importance of model parameters. (A) Performance metrics for the prediction of CTEPH. Error bars indicate the 95% confidence intervals. (B) Relative feature importance of the ten most predictive parameters in the CTEPH prediction model. Abb.: ACC, accuracy; AUC, area under the curve; CTEPH, chronic thromboembolic pulmonary hypertension; NPV, negative predictive value, PPV, positive predictive value; SNS, sensitivity, SPC, specificity.
Fig. 4
Fig. 4
Machine learning performances for post-PE syndrome and relative importance of model parameters. (A) Performance metrics for the prediction of post-PE syndrome. Error bars indicate the 95% confidence intervals. (B) Relative feature importance of the ten most predictive parameters in the post-PE prediction model. Abb.: ACC, accuracy; AUC, area under the curve; NPV, negative predictive value, PPV, positive predictive value; SNS, sensitivity, SPC, specificity.

References

    1. Raskob, G. E. et al. Thrombosis: A major contributor to global disease burden. Arterioscler Thromb. Vasc. Biol.34(11), 2363–2371 (2014). - PubMed
    1. Kahn, S. R. et al. Long-term outcomes after pulmonary embolism: current knowledge and future research. Blood Coagul. Fibrinolysis Int. J. Haemost. Thromb.25(5), 407–415 (2014). - PubMed
    1. Sista, A. K. & Klok, F. A. Late outcomes of pulmonary embolism: The post-PE syndrome. Thromb. Res.164, 157–162 (2018). - PubMed
    1. Luijten, D. et al. Incidence of chronic thromboembolic pulmonary hypertension after acute pulmonary embolism: An updated systematic review and meta-analysis. Eur. Respir. J.62, 2300449 (2023). - PubMed
    1. Klok, F. A. et al. The post-PE syndrome: A new concept for chronic complications of pulmonary embolism. Blood Rev.28(6), 221–226 (2014). - PubMed

LinkOut - more resources