Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Observational Study
. 2025 Aug;30(8):3632-3639.
doi: 10.1038/s41380-025-02950-0. Epub 2025 Mar 19.

Generalizability of clinical prediction models in mental health

Collaborators, Affiliations
Observational Study

Generalizability of clinical prediction models in mental health

Maike Richter et al. Mol Psychiatry. 2025 Aug.

Abstract

Concerns about the generalizability of machine learning models in mental health arise, partly due to sampling effects and data disparities between research cohorts and real-world populations. We aimed to investigate whether a machine learning model trained solely on easily accessible and low-cost clinical data can predict depressive symptom severity in unseen, independent datasets from various research and real-world clinical contexts. This observational multi-cohort study included 3021 participants (62.03% females, MAge = 36.27 years, range 15-81) from ten European research and clinical settings, all diagnosed with an affective disorder. We firstly compared research and real-world inpatients from the same treatment center using 76 clinical and sociodemographic variables. An elastic net algorithm with ten-fold cross-validation was then applied to develop a sparse machine learning model for predicting depression severity based on the top five features (global functioning, extraversion, neuroticism, emotional abuse in childhood, and somatization). Model generalizability was tested across nine external samples. The model reliably predicted depression severity across all samples (r = 0.60, SD = 0.089, p < 0.0001) and in each individual external sample, ranging in performance from r = 0.48 in a real-world general population sample to r = 0.73 in real-world inpatients. These results suggest that machine learning models trained on sparse clinical data have the potential to predict illness severity across diverse settings, offering insights that could inform the development of more generalizable tools for use in routine psychiatric data analysis.

PubMed Disclaimer

Conflict of interest statement

Competing interests: FP is a member of the European Scientific Advisory Board of Brainsway Inc., Jerusalem, Israel, and the International Scientific Advisory Board of Sooma, Helsinki, Finland. He has received speaker’s honoraria from Mag&More GmbH and the neuroCare Group. His lab has received support with equipment from neuroConn GmbH, Ilmenau, Germany, and Mag&More GmbH and Brainsway Inc., Jerusalem, Israel. MAR has received financial research support from the EU (H2020 No. 754740) and served as PI in clinical trials from Abide Therapeutics, Böhringer-Ingelheim, Emalex Biosciences, Lundbeck GmbH, Nuvelution TS Pharma Inc., Oryzon, Otsuka Pharmaceuticals and Therapix Biosciences. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. Consent to participate: Written informed consent was obtained from all participants of this study. Study approval: The research in all samples was conducted ethically in accordance with the World Medical Association Declaration of Helsinki and approved by all samples’ local institutional review boards and ethics committees named hereafter: • Ethikkommission beider Basel (EKBB), • Klinisches Ethikkomitee der Universitären Psychiatrischen Kliniken Basel (KLINEK UPK), • Ethics Commission of the Westphalia-Lippe Medical Association and the University of Münster, • Ethics Committee of the Faculty of Psychology and Sports Science at the University of Münster, • Ethics Commission of the Faculty of Medicine at the University of Münster, • Ethics Commission of the Faculty of Medicine at the Philipps-University Marburg, • Ethics Commission of the Faculty of Medicine of Friedrich-Schiller-University Jena, • Ethics Commission of the Faculty of Medicine at Ludwig-Maximilians-University Munich, • Ethics Commission of the Faculty of Medicine at Martin-Luther-University Halle-Wittenberg, • Kommission für ethische Fragen der Wissenschaft der Martin-Luther-Universität Halle-Wittenberg (KeFW), • Ethics Commission of the Faculty of Medicine of Cologne University, • Ethics Committee of the Medical Faculty at Heinrich Heine University of Düsseldorf, • Ethics Commission of the University of Turku, • Independent Ethical Committee of the University of Bari Aldo Moro (Il Comitato Etico), • Departmental Commission for the Experimentation and Protection of the Person of the Department of Medicine of the University of Udine (IRB-DMED), • Ethics Committee of the University of Milan, • Ethics Committee of the University of Birmingham.

Figures

Fig. 1
Fig. 1. Analytic workflow, model evaluation, and results of multisite model validation.
A Analytic workflow from systematic differences analysis to multisite model evaluation. B Scatter plot depicting p-values for group differences between study population and real-world inpatients from site #1 across clinical and demographic variables. C1 Line plot of ranked feature importances with specified cutoff. C2 Bar plot highlighting the top 5 features selected through permutation importance analysis. D External validation results of the base model showing Pearson correlation of true and predicted depressive symptoms, contrasted across nine external sites. E Follow-up validation scatter plot showing Pearson correlation of true and predicted depressive symptoms following therapeutic intervention, including the presentation of average follow-up durations by site.

References

    1. Bzdok D, Meyer-Lindenberg A. Machine learning for precision psychiatry: opportunities and challenges. Biol Psychiatry Cogn Neurosci Neuroimaging. 2018;3:223–30. - PubMed
    1. Altman DG, Royston P. What do we mean by validating a prognostic model? Stat Med. 2000;19:453–73. - PubMed
    1. Tejavibulya L, Rolison M, Gao S, Liang Q, Peterson H, Dadashkarimi J, et al. Predicting the future of neuroimaging predictive models in mental health. Mol Psychiatry. 2022;27:3129–37. - PMC - PubMed
    1. Cohen SE, Zantvoord JB, Wezenberg BN, Bockting CLH, van Wingen GA. Magnetic resonance imaging for individual prediction of treatment response in major depressive disorder: a systematic review and meta-analysis. Transl Psychiatry. 2021;11:168. - PMC - PubMed
    1. Koutsouleris N, Dwyer DB, Degenhardt F, Maj C, Urquijo-Castro MF, Sanfelici R, et al. Multimodal machine learning workflows for prediction of psychosis in patients with clinical high-risk syndromes and recent-onset depression. JAMA Psychiatry. 2021;78:195–209. - PMC - PubMed

Publication types