. 2024 Nov 22;26(1):bbae665.

doi: 10.1093/bib/bbae665.

Bayesian unsupervised clustering identifies clinically relevant osteosarcoma subtypes

Sergio Llaneza-Lago¹, William D Fraser², Darrell Green¹

Affiliations

¹ Biomedical Research Centre, Norwich Medical School, University of East Anglia, Norwich Research Park, Norwich NR4 7TJ, United Kingdom.
² Bioanalytical Facility, Norwich Medical School, University of East Anglia, Norwich Research Park, Norwich NR4 7UQ, United Kingdom.

PMID: 39701601
PMCID: PMC11658815
DOI: 10.1093/bib/bbae665

Bayesian unsupervised clustering identifies clinically relevant osteosarcoma subtypes

Sergio Llaneza-Lago et al. Brief Bioinform. 2024.

. 2024 Nov 22;26(1):bbae665.

doi: 10.1093/bib/bbae665.

Authors

Sergio Llaneza-Lago¹, William D Fraser², Darrell Green¹

Affiliations

¹ Biomedical Research Centre, Norwich Medical School, University of East Anglia, Norwich Research Park, Norwich NR4 7TJ, United Kingdom.
² Bioanalytical Facility, Norwich Medical School, University of East Anglia, Norwich Research Park, Norwich NR4 7UQ, United Kingdom.

PMID: 39701601
PMCID: PMC11658815
DOI: 10.1093/bib/bbae665

Abstract

Identification of cancer subtypes is a critical step for developing precision medicine. Most cancer subtyping is based on the analysis of RNA sequencing (RNA-seq) data from patient cohorts using unsupervised machine learning methods such as hierarchical cluster analysis, but these computational approaches disregard the heterogeneous composition of individual cancer samples. Here, we used a more sophisticated unsupervised Bayesian model termed latent process decomposition (LPD), which handles individual cancer sample heterogeneity and deconvolutes the structure of transcriptome data to provide clinically relevant information. The work was performed on the pediatric tumor osteosarcoma, which is a prototypical model for a rare and heterogeneous cancer. The LPD model detected three osteosarcoma subtypes. The subtype with the poorest prognosis was validated using independent patient datasets. This new stratification framework will be important for more accurate diagnostic labeling, expediting precision medicine, and improving clinical trial success. Our results emphasize the importance of using more sophisticated machine learning approaches (and for teaching deep learning and artificial intelligence) for RNA-seq data analysis, which may assist drug targeting and clinical management.

Keywords: RNA-seq; heterogeneity; latent process decomposition; osteosarcoma; precision medicine.

PubMed Disclaimer

Figures

**Figure 1**
Latent process decomposition model optimization, subtype assignment and clinical outcome. (a) Hyperparameter optimization for the TARGET dataset. LPD assesses the explanatory power of different combinations of sigma values (process spread) and the number of processes. The optimal combination is determined as the point of maximum log-likelihood before the onset of overfitting, visually identified as a plateau in the curves. For the TARGET dataset, the optimal parameters were three processes and a sigma value of −0.0001. (b) Sample assignment to subtypes. Bar plot illustrates sample assignment to the three identified subtypes based on their degree of membership (gamma value). Higher gamma values indicate stronger membership in a specific subtype reflecting the extent to which each subtype captures sample-specific transcriptomic variability. (c) Kaplan–Meier curves illustrate the survival probability over time for each subtype. Pairwise comparisons between subtypes are shown with log-rank p-values and sample sizes provided for each comparison.

**Figure 2**
Correlation of gene expression profiles between poor prognosis TARGET LPD-1 and corresponding subtypes. Scatter plots comparing the expression levels of the top 500 most variable transcripts across the entire TARGET dataset between TARGET LPD-1 and the corresponding most similar subtypes from the GREEN (GREEN LPD-1), PERRY (PERRY LPD-2), and SCOTT (SCOTT LPD-1) datasets. Trend lines and Pearson correlation coefficients (r) with corresponding P-values are displayed for each comparison.

**Figure 3**
Overlap of DE transcripts. Venn diagram illustrating the overlap of DE transcripts between TARGET LPD-1 and the most closely correlated subtypes from the GREEN, PERRY, and SCOTT datasets. The diagram quantifies the number of DE transcripts in each dataset and identifies eight transcripts shared across all four poor prognoses datasets.

**Figure 4**
Comparative evaluation of traditional clustering methods. (a) Silhouette analysis to determine the optimal number of clusters for hierarchical and k-means clustering in the TARGET dataset. Three clusters were identified as optimal, with six clusters showing similar performance. (b) Kaplan–Meier survival curves comparing patient survival based on hierarchical and k-means clustering groups using both three and six clusters as suggested by the silhouette analysis. Log-rank test was used to assess statistical significance.

See this image and copyright information in PMC

Cited by

Post-treatment late and long-term effects in bone sarcoma: A scoping review.
Khan K, Kane K, Davison Z, Green D. Khan K, et al. J Bone Oncol. 2025 Mar 21;52:100671. doi: 10.1016/j.jbo.2025.100671. eCollection 2025 Jun. J Bone Oncol. 2025. PMID: 40206491 Free PMC article. Review.
Correlation of multiple peripheral blood parameters with metastasis and invasion of papillary thyroid cancer: a retrospective cohort study.
Chen X, Wang HY, Yu L, Liu JQ, Sun H. Chen X, et al. Endocrine. 2025 Jun;88(3):757-765. doi: 10.1007/s12020-025-04194-y. Epub 2025 Mar 1. Endocrine. 2025. PMID: 40025307 Free PMC article.
Exploratory Analysis of Molecular Subtypes in Early-Stage Osteosarcoma: Identifying Resistance and Optimizing Therapy.
Bojic L, Peric M, Karanovic J, Milosevic E, Kovacevic Grujicic N, Milivojevic M. Bojic L, et al. Cancers (Basel). 2025 May 16;17(10):1677. doi: 10.3390/cancers17101677. Cancers (Basel). 2025. PMID: 40427174 Free PMC article.
Targeting metastasis in paediatric bone sarcomas.
Bull EC, Singh A, Harden AM, Soanes K, Habash H, Toracchio L, Carrabotta M, Schreck C, Shah KM, Riestra PV, Chantoiseau M, Da Costa MEM, Moquin-Beaudry G, Pantziarka P, Essiet EA, Gerrand C, Gartland A, Bojmar L, Fahlgren A, Marchais A, Papakonstantinou E, Tomazou EM, Surdez D, Heymann D, Cidre-Aranaz F, Fromigue O, Sexton DW, Herold N, Grünewald TGP, Scotlandi K, Nathrath M, Green D. Bull EC, et al. Mol Cancer. 2025 May 29;24(1):153. doi: 10.1186/s12943-025-02365-z. Mol Cancer. 2025. PMID: 40442778 Free PMC article. Review.

References

1. Bolton KL, Chen D, Corona de la Fuente R. et al. . Molecular subclasses of clear cell ovarian carcinoma and their impact on disease behavior and outcomes. Clin Cancer Res 2022;28:4947–56. 10.1158/1078-0432.CCR-21-3817. - DOI - PMC - PubMed
1. Morselli Gysi D, Barabási AL. Noncoding RNAs improve the predictive power of network medicine. Proc Natl Acad Sci USA 2023;120:e2301342120. 10.1073/pnas.2301342120. - DOI - PMC - PubMed
1. Green D, Ewijk R, Tirtei E. et al. . Biological sample collection to advance research and treatment: A fight osteosarcoma through European research (FOSTER) and euro Ewing consortium (EEC) statement. Clin Cancer Res 2024;30:3395–406. - PMC - PubMed
1. Sorlie T, Tibshirani R, Parker J. et al. . Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad Sci USA 2003;100:8418–23. - PMC - PubMed
1. Yeh JM, Ward ZJ, Chaudhry A. et al. . Life expectancy of adult survivors of childhood cancer over 3 decades. JAMA Oncol 2020;6:350–7. 10.1001/jamaoncol.2019.5582. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

21-343/CHILDREN with CANCER UK

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Bayesian unsupervised clustering identifies clinically relevant osteosarcoma subtypes

Affiliations

Bayesian unsupervised clustering identifies clinically relevant osteosarcoma subtypes

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources