. 2024 Jun:8:e2400008.

doi: 10.1200/CCI.24.00008.

MOSAIC: An Artificial Intelligence-Based Framework for Multimodal Analysis, Classification, and Personalized Prognostic Assessment in Rare Cancers

Saverio D'Amico^{1

2}, Lorenzo Dall'Olio³, Cesare Rollo⁴, Patricia Alonso⁵, Iñigo Prada-Luengo⁶, Daniele Dall'Olio³, Claudia Sala⁷, Elisabetta Sauta¹, Gianluca Asti¹, Luca Lanino¹, Giulia Maggioni¹, Alessia Campagna¹, Elena Zazzetti¹, Mattia Delleani¹, Maria Elena Bicchieri¹, Pierandrea Morandini¹, Victor Savevski¹, Borja Arroyo⁵, Juan Parras⁵, Lin Pierre Zhao⁸, Uwe Platzbecker⁹, Maria Diez-Campelo¹⁰, Valeria Santini¹¹, Pierre Fenaux⁸, Torsten Haferlach¹², Anders Krogh⁶, Santiago Zazo⁵, Piero Fariselli⁴, Tiziana Sanavia⁴, Matteo Giovanni Della Porta^{1

13}, Gastone Castellani^{3

7}

Affiliations

¹ Humanitas Clinical and Research Center-IRCCS, Milan, Italy.
² Train s.r.l., Milan, Italy.
³ Department of Physics and Astronomy (DIFA), Bologna, Italy.
⁴ Computational Biomedicine Unit, Department of Medical Sciences, University of Turin, Turin, Italy.
⁵ Department of Signals, Systems and Radiocommunications, Polytechnic University of Madrid, Madrid, Spain.
⁶ University of Copenhagen, Copenhagen, Denmark.
⁷ Experimental, Diagnostic and Specialty Medicine-DIMES, Bologna, Italy.
⁸ Hematology and Bone Marrow Transplantation, Hôpital Saint-Louis/University Paris 7, Paris, France.
⁹ Medical Clinic and Policlinic 1, Hematology and Cellular Therapy, University Hospital Leipzig, Leipzig, Germany.
¹⁰ Hematology Department, Hospital Universitario de Salamanca, Salamanca, Spain.
¹¹ Hematology, Azienda Ospedaliero-Universitaria Careggi & University of Florence, Florence, Italy.
¹² MLL Munich Leukemia Laboratory, Munich, Germany.
¹³ Department of Biomedical Sciences, Humanitas University, Milan, Italy.

PMID: 38875514
PMCID: PMC11371092
DOI: 10.1200/CCI.24.00008

MOSAIC: An Artificial Intelligence-Based Framework for Multimodal Analysis, Classification, and Personalized Prognostic Assessment in Rare Cancers

Saverio D'Amico et al. JCO Clin Cancer Inform. 2024 Jun.

. 2024 Jun:8:e2400008.

doi: 10.1200/CCI.24.00008.

Authors

Affiliations

¹ Humanitas Clinical and Research Center-IRCCS, Milan, Italy.
² Train s.r.l., Milan, Italy.
³ Department of Physics and Astronomy (DIFA), Bologna, Italy.
⁴ Computational Biomedicine Unit, Department of Medical Sciences, University of Turin, Turin, Italy.
⁵ Department of Signals, Systems and Radiocommunications, Polytechnic University of Madrid, Madrid, Spain.
⁶ University of Copenhagen, Copenhagen, Denmark.
⁷ Experimental, Diagnostic and Specialty Medicine-DIMES, Bologna, Italy.
⁸ Hematology and Bone Marrow Transplantation, Hôpital Saint-Louis/University Paris 7, Paris, France.
⁹ Medical Clinic and Policlinic 1, Hematology and Cellular Therapy, University Hospital Leipzig, Leipzig, Germany.
¹⁰ Hematology Department, Hospital Universitario de Salamanca, Salamanca, Spain.
¹¹ Hematology, Azienda Ospedaliero-Universitaria Careggi & University of Florence, Florence, Italy.
¹² MLL Munich Leukemia Laboratory, Munich, Germany.
¹³ Department of Biomedical Sciences, Humanitas University, Milan, Italy.

PMID: 38875514
PMCID: PMC11371092
DOI: 10.1200/CCI.24.00008

Abstract

Purpose: Rare cancers constitute over 20% of human neoplasms, often affecting patients with unmet medical needs. The development of effective classification and prognostication systems is crucial to improve the decision-making process and drive innovative treatment strategies. We have created and implemented MOSAIC, an artificial intelligence (AI)-based framework designed for multimodal analysis, classification, and personalized prognostic assessment in rare cancers. Clinical validation was performed on myelodysplastic syndrome (MDS), a rare hematologic cancer with clinical and genomic heterogeneities.

Methods: We analyzed 4,427 patients with MDS divided into training and validation cohorts. Deep learning methods were applied to integrate and impute clinical/genomic features. Clustering was performed by combining Uniform Manifold Approximation and Projection for Dimension Reduction + Hierarchical Density-Based Spatial Clustering of Applications with Noise (UMAP + HDBSCAN) methods, compared with the conventional Hierarchical Dirichlet Process (HDP). Linear and AI-based nonlinear approaches were compared for survival prediction. Explainable AI (Shapley Additive Explanations approach [SHAP]) and federated learning were used to improve the interpretation and the performance of the clinical models, integrating them into distributed infrastructure.

Results: UMAP + HDBSCAN clustering obtained a more granular patient stratification, achieving a higher average silhouette coefficient (0.16) with respect to HDP (0.01) and higher balanced accuracy in cluster classification by Random Forest (92.7% ± 1.3% and 85.8% ± 0.8%). AI methods for survival prediction outperform conventional statistical techniques and the reference prognostic tool for MDS. Nonlinear Gradient Boosting Survival stands in the internal (Concordance-Index [C-Index], 0.77; SD, 0.01) and external validation (C-Index, 0.74; SD, 0.02). SHAP analysis revealed that similar features drove patients' subgroups and outcomes in both training and validation cohorts. Federated implementation improved the accuracy of developed models.

Conclusion: MOSAIC provides an explainable and robust framework to optimize classification and prognostic assessment of rare cancers. AI-based approaches demonstrated superior accuracy in capturing genomic similarities and providing individual prognostic information compared with conventional statistical methods. Its federated implementation ensures broad clinical application, guaranteeing high performance and data protection.

PubMed Disclaimer

Conflict of interest statement

The following represents disclosure information provided by authors of this manuscript. All relationships are considered compensated unless otherwise noted. Relationships are self-held unless noted. I = Immediate Family Member, Inst = My Institution. Relationships may not relate to the subject matter of this manuscript. For more information about ASCO's conflict of interest policy, please refer to www.asco.org/rwc or ascopubs.org/cci/author-center.

Open Payments is a public database containing information reported by companies about payments made to US-licensed physicians (Open Payments).

Iñigo Prada-Luengo

Stock and Other Ownership Interests: BioNano Genomics

Travel, Accommodations, Expenses: Euroimmun

Uwe Platzbecker

Honoraria: Celgene/Jazz, AbbVie, Curis, Geron, Janssen

Consulting or Advisory Role: Celgene/Jazz, Novartis, BMS GmbH & Co KG

Research Funding: Amgen (Inst), Janssen (Inst), Novartis (Inst), BerGenBio (Inst), Celgene (Inst), Curis (Inst)

Patents, Royalties, Other Intellectual Property: Part of a patent for a TFR-2 antibody (Rauner et al. Nature Metabolics 2019)

Travel, Accommodations, Expenses: Celgene

Maria Diez-Campelo

Honoraria: Celgene, Novartis, Keros Therapeutics

Consulting or Advisory Role: Celgene, Novartis, GlaxoSmithKline, Blueprint Medicines, Agios, Hemavan, Syros, Keros Therapeutics, Curis

Travel, Accommodations, Expenses: Gilead Sciences

Valeria Santini

Honoraria: Celgene/Bristol Myers Squibb, Novartis

Consulting or Advisory Role: Celgene/Bristol Myers Squibb, Novartis, Gilead Sciences, AbbVie, Syros Pharmaceuticals, Servier, Geron, CTI, Otsuka, Curis

Research Funding: Celgene (Inst)

Travel, Accommodations, Expenses: Janssen-Cilag, Celgene

Pierre Fenaux

Honoraria: Bristol Myers Squibb

Consulting or Advisory Role: Bristol Myers Squibb

Research Funding: Bristol Myers Squibb

Torsten Haferlach

Employment: MLL Munich Leukemia Laboratory

Leadership: MLL Munich Leukemia Laboratory

Anders Krogh

Company: AJ Vaccines (I)

No other potential conflicts of interest were reported.

Figures

**FIG 1.**
Overview of the MOSAIC framework architecture applied on training and validation cohorts. The figure shows the AI-based framework for multimodal analysis of classification and personalized prognostic assessment in rare cancers. Once the analysis framework is applied to the training cohort, the validated models can be used on new patients (green block), even in a federated environment. The scheme suggests analysis pathways, methods, and how to use them, for the multimodal analysis of classification and prognostic assessment in rare cancers, including the implementation of the models in a federated environment to enhance performance while maintaining a high degree of privacy. AI, artificial intelligence.

**FIG 2.**
Patient clustering on the basis of genomic features performed using AI-based clustering and HDP methods on the MDS cohort (N = 2,043). (A) UMAP two-dimensional embedding. Each dot represents a patient, whose location is defined on the basis of its cytogenetics and genomic features (gene mutations). The figure shows the number of assigned clusters together with some labels to specify the genomic characterization of some clusters. The model found 18 clusters with 56 unclear patients assigned to cluster –1. (B) Alluvial plot showing the more granular classification of HDBSCAN compared with the clinical groups in the study by Bersanelli et al found using the HDP clustering approach. AI, artificial intelligence; HDBSCAN, Hierarchical Density-Based Spatial Clustering of Applications with Noise; HDP, Hierarchical Dirichlet Process; MDS, myelodysplastic syndrome; UMAP, Uniform Manifold Approximation and Projection for Dimension Reduction.

**FIG 3.**
Validation of the identified clusters in the MDS cohorts (N = 4,427) using XAI frameworks. (A) Cluster relative frequency in both MDS training (N = 2,043) and validation cohorts (N = 2,384); clusters were assigned training a RF classifier (100 trees, maximum depth = 35, minutes samples per leaf = 1) on the whole training cohort. (B) Average impact on cluster assignment for every feature and every cluster, obtained using SHAP on the trained best selected RF classifier on the training data set (left). Average impact on cluster assignment for every feature and every cluster, obtained using SHAP on the trained best selected RF classifier on the validation data set (right). (C) Feature impact on cluster assignment, obtained using SHAP on the trained best selected RF classifier, for both training and validation cohorts. MDS, myelodysplastic syndrome; RF, Random Forest; SHAP, Shapley Additive Explanations Approach; XAI, Explainable artificial intelligence.

**FIG 4.**
Prognostic assessment of patients with MDS (N = 2,043) on the basis of clinical and genomic features comparing different methods for survival prediction. (A) Comparison of different overall survival prediction methods in MDS: CoxPH model (and its penalized version), Random Survival Forests, DeepCox, Gradient Boosting, and XGboost survival methods. C-Index was used to evaluate model performance; C-Index of the conventional IPSS-R scoring system is reported as a baseline. *P < .01, **P < .001, ***P < .0001. (B) Validation using XAI frameworks of the best-performing survival model (Gradient Boosting). The figure shows the features' impact on overall survival prediction in the training (right) and validation cohorts (left). C-Index, Concordance-Index; IPSS-R, Revised International Prognostic Scoring System; MDS, myelodysplastic syndrome; XAI, Explainable artificial intelligence.

**FIG 5.**
Federated learning implementation. (A) Overview of experimental settings implemented to test the benefits of a federated learning architecture. Setting C shows the federated architecture's implementation, allowing information of individual models sharing without data transfer. In this setting, we simulated three different centers (ie, hospitals providing data) to have 60%, 30%, and 10% of the total MDS training patient population (N = 2,043). (B) The figure shows the evolution of the C-Index for overall survival calculated at each epoch during model training. It can be clearly observed how the value of this metric rises at five epochs in the nodes that train in a federated way. After this increase, they continue training with their data for another five epochs. This is why peaks can appear during training for each of the nodes, especially for nodes 2 and 3. (C) Experiment mean-SD results for C-Index metric. The results are evaluated for the overall survival. Bold entries refer to the best results for each node. C-Index, Concordance-Index; MDS, myelodysplastic syndrome; OS, overall survival; SD, standard deviation.

See this image and copyright information in PMC

References

1. Reference deleted.
1. RARECARENet (information network on rare cancers). https://www.rarecarenet.eu/
1. Gatta G, van der Zwan JM, Casali PG, et al. : Rare cancers are not so rare: The rare cancer burden in Europe. Eur J Cancer 47:2493-2511, 2011 - PubMed
1. DeSantis CE, Kramer JL, Jemal A: The burden of rare cancers in the United States. CA Cancer J Clin 67:261-272, 2017 - PubMed
1. Billingham L, Malottki K, Steven N: Research methods to change clinical practice for patients with rare cancers. Lancet Oncol 17:e70-e80, 2016 - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

MOSAIC: An Artificial Intelligence-Based Framework for Multimodal Analysis, Classification, and Personalized Prognostic Assessment in Rare Cancers

Affiliations

MOSAIC: An Artificial Intelligence-Based Framework for Multimodal Analysis, Classification, and Personalized Prognostic Assessment in Rare Cancers

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Research Materials

Miscellaneous