Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec 16;4(1):265.
doi: 10.1038/s43856-024-00637-1.

Interpretable multimodal machine learning (IMML) framework reveals pathological signatures of distal sensorimotor polyneuropathy

Affiliations

Interpretable multimodal machine learning (IMML) framework reveals pathological signatures of distal sensorimotor polyneuropathy

Phong B H Nguyen et al. Commun Med (Lond). .

Abstract

Background: Distal sensorimotor polyneuropathy (DSPN) is a common neurological disorder in elderly adults and people with obesity, prediabetes and diabetes and is associated with high morbidity and premature mortality. DSPN is a multifactorial disease and not fully understood yet.

Methods: Here, we developed the Interpretable Multimodal Machine Learning (IMML) framework for predicting DSPN prevalence and incidence based on sparse multimodal data. Exploiting IMMLs interpretability further empowered biomarker identification. We leveraged the population-based KORA F4/FF4 cohort including 1091 participants and their deep multimodal characterisation, i.e. clinical data, genomics, methylomics, transcriptomics, proteomics, inflammatory proteins and metabolomics.

Results: Clinical data alone is sufficient to stratify individuals with and without DSPN (AUROC = 0.752), whilst predicting DSPN incidence 6.5 ± 0.2 years later strongly benefits from clinical data complemented with two or more molecular modalities (improved ΔAUROC > 0.1, achieved AUROC of 0.714). Important and interpretable features of incident DSPN prediction include up-regulation of proinflammatory cytokines, down-regulation of SUMOylation pathway and essential fatty acids, thus yielding novel insights in the disease pathophysiology.

Conclusions: These may become biomarkers for incident DSPN, guide prevention strategies and serve as proof of concept for the utility of IMML in studying complex diseases.

Plain language summary

Distal sensorimotor polyneuropathy (DSPN) is a common neurological disorder in elderly adults and people with obesity, prediabetes, and diabetes in which there is tingling or numbness with or without pain. It is not fully understood why it develops. We developed a computational method that uses various sources of information to enable people with DSPN to be identified and also to predict which people might develop DSPN in the future. Further development of our method might provide additional information that can be used to prevent development of DSPN in people with obesity, prediabetes, and diabetes. Also, our method could potentially be adapted to enable other complex diseases to be better understood.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Workflow of interpretable multimodal framework for feature prioritisation, DSPN classification and disease incidence prediction.
a Distribution of samples across time points (KORA F4 and FF4), disease status (case or control) at baseline (KORA F4) and follow-up (KORA FF4) and prediction tasks. Both models were trained on the same set of F4 features but different labels and a subset of samples. b Number of features stratified according to data modalities. In grey are removed features after pre-processing. c Number of samples characterised within each data modality and their overlaps in KORA F4. d Fully characterised samples in KORA F4 were exclusively leveraged for g the second and final training step, whilst the remaining sparse samples were used for e prior feature prioritisation: All molecular features were shortlisted based on differential expression analysis (DEA), gene set enrichment analysis (GSEA) and their leading-edge genes (“Methods”), whilst clinical features were ranked according to feature importance of elastic net models. f Features for the final training step were selected based on rank aggregation (“Methods”). g The final training set contained 54 DSPN cases and 188 controls in KORA F4. In the second step, elastic net models determined the optimal number of modalities, features and combination of modalities. These models implemented forward feature selection in a nested cross-validation, using weighted log loss to account for class imbalance, and finally 100 stratified resampling during training and rank aggregation (“Methods”), thus returning h the refined and final model further subject to functional analysis for gaining insights in DSPN pathophysiology.
Fig. 2
Fig. 2. The clinical model can sufficiently stratify DSPN prevalence.
a Classification of DSPN first leverages clinical attributes, and cumulatively adds molecular modalities with forward feature selection (“Methods”). Here shown for 100 cross-validated models. b Test set performance of DSPN classification leveraging between one to seven data modalities. Error bars of the boxplot indicate 95% CI. c Prediction probabilities of samples in the 100 left-out test sets leveraging clinical features only, stratified into true labels (case and control). d Feature importance of the final model based on clinical attributes alone applied to training and feature selection set (“Methods”). e PCA leveraging the four most important clinical features shown in panel d to stratify cases from control. f Distribution of the test prediction probability of all samples of 100 resampled and cross-validated models. g Normalised values of the four most important clinical features. The order of samples corresponds to panel (f).
Fig. 3
Fig. 3. Predicting DSPN incidence benefits from molecular data.
a Each model starts with clinical attributes at baseline, and consecutively increases the number of modalities by adding the next molecular modality with feed forward selection for 100 cross-validated models (“Methods”). b Performance of all model complexities to predict patient trajectories. Error bars of the boxplot indicate 95% CI. c Prediction probabilities of samples in the 100 left-out testing sets using the optimal mode of the corresponding iterations, stratified into true labels (case and control). d Important features of the final model. x-axis represents the signed model important scores (t-statistics) of the features in the training set, y-axis represents their t-statistics in the feature selection set. e PCA leveraging the most important features of the final model in panel (d). f Waterfall plot of prediction probability of all samples across 100 resampling steps. g Normalised values of the important features in panel (d) stratified by individual samples and ordered according to panel (f). Features belonging to the same data modality are grouped together.
Fig. 4
Fig. 4. Enrichment of inflammatory cytokines- and essential fatty acids-related pathways as important signatures of DSPN progression.
a Sub-network of important features to predict development of DSPN. Each node is a feature coloured according to its data modality. Edges are the number of shared molecule sets between two nodes. The important features in the final model are highlighted and labelled in black. Below are examples of enriched molecule sets associated with b inflammation-related proteins, c transcripts and d metabolites: b The upregulation of “Chemokine receptors bind chemokine” gene set. c SUMOylation of DNA replication proteins. d G alpha (q) signalling events. Molecules are ranked in decreasing order of t-statistics, with ticks representing molecules that belong to the examined molecule set.

Similar articles

References

    1. Health Organization. Global Health Estimates 2016: Deaths by Cause, Age, Sex, by Country and by Region, 2000–2016. (World Health Organization, Geneva, 2018).
    1. Shaw, J. E., Sicree, R. A. & Zimmet, P. Z. Global estimates of the prevalence of diabetes for 2010 and 2030. Diabetes Res. Clin. Pract.87, 4–14 (2010). - PubMed
    1. Bayliss, E. A., Steiner, J. F., Fernald, D. H., Crane, L. A. & Main, D. S. Descriptions of barriers to self-care by persons with comorbid chronic diseases. Ann. Fam. Med.1, 15–21 (2003). - PMC - PubMed
    1. Kerr, E. A. et al. Beyond comorbidity counts: how do comorbidity type and severity influence diabetes patients’ treatment priorities and self-management? J. Gen. Intern. Med.22, 1635–1640 (2007). - PMC - PubMed
    1. Pop-Busui, R. et al. Diabetic neuropathy: a position statement by the american diabetes association. Diabetes Care40, 136–154 (2017). - PMC - PubMed

LinkOut - more resources