Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec;9(12):103998.
doi: 10.1016/j.esmoop.2024.103998. Epub 2024 Nov 25.

A machine learning-based analysis of nationwide cancer comprehensive genomic profiling data across cancer types to identify features associated with recommendation of genome-matched therapy

Affiliations

A machine learning-based analysis of nationwide cancer comprehensive genomic profiling data across cancer types to identify features associated with recommendation of genome-matched therapy

H Ikushima et al. ESMO Open. 2024 Dec.

Abstract

Background: The low probability of identifying druggable mutations through comprehensive genomic profiling (CGP) and its financial and time costs hinder its widespread adoption. To enhance the effectiveness and efficiency of cancer precision medicine, it is critical to identify patient characteristics that are most likely to benefit from CGP.

Patients and methods: This nationwide retrospective study employed machine learning models to predict the identification of genome-matched therapies by CGP, utilizing a national database covering 99.7% of the patients who underwent CGP in Japan from June 2019 to November 2023. Prediction models were constructed for the overall cancer population, specific cancer types, and adolescent and young adult (AYA) group. The SHapley Additive exPlanations (SHAP) algorithm was applied to elucidate clinical features contributing to model predictions.

Results: This study included 60 655 patients [mean age (standard deviation), 60.8 years (14.5 years); 50.1% males]. CGP identified at least one genome-matched therapy in 11 227 cases (18.5%). The best prediction model was eXtreme Gradient Boosting (XGBoost) with an area under the receiver operating characteristic curve of 0.819. Cancer type was the most important predictor (negative for pancreas and positive for breast and lung), followed by the age, presence of liver metastasis, and number of metastatic sites. Analysis of cancer type-specific models identified several organ-specific features, including the sex, interval between the cancer diagnosis and CGP, sampling site, and CGP panel. Among 3455 AYA patients, genome-matched therapies were identified in 459 patients (13.3%). The AYA-specific model achieved an area under the receiver operating characteristic curve of 0.768, with bone tumor identified as a negative predictor in addition to those identified in the overall cancer population model.

Conclusion: Several factors predicting the identification of genome-matched therapies through CGP were identified for the overall cancer population and cancer type-specific subpopulations. Expedited CGP is recommended for patients who match the identified profile to facilitate early targeted therapy.

Keywords: adolescent and young adult; comprehensive genomic profiling; explainable artificial intelligence; genome-matched therapy; machine learning.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Prediction models for the whole C-CAT cohort. (A) Distribution of cancer types among 60 655 patients registered in the C-CAT database. The 10 most frequent cancer types are shown in the pie chart. (B) Distribution of the number of identified drugs with evidence level A or B in the C-CAT Findings Report. (C) The receiver operating characteristic curve of the constructed models for the holdout test set. (D) The calibration plot visualizes how correctly the probability predicted by the XGBoost model estimates the observed discovery rate of genome-matched therapies in the test dataset. The x-axis of each point represents the mean predicted probability in each bin, whereas the y-axis of each point represents discovery rate of genome-matched therapies in each bin. The dashed line represents a perfect calibrated model. (E) SHAP summary plot of the most contributing features ranked by descending contribution to the prediction of the whole cohort model. The left bar chart represents the mean absolute SHAP values across the cases in the holdout test set. In the right chart, the colors represent the feature values, with red indicating higher or positive values and blue indicating lower or negative values. C-CAT, Center for Cancer Genomics and Advanced Therapeutics; SHAP, SHapley Additive exPlanations.
Figure 2
Figure 2
Cancer type-specific trends in the identification of genome-matched therapies. (A) Distribution of the number of identified drugs with evidence level A or B in each cancer type-specific cohort. The proportion of cases in which at least one drug with evidence level A or B was identified is shown on each graph. (B) AUROC of the trained cancer type-specific random forest-, XGBoost-, and CatBoost-based prediction models for the holdout test dataset. AUROC, area under the receiver operating characteristic curve.
Figure 3
Figure 3
Explainability analysis of the cancer type-specific prediction models. SHAP summary plot of the most contributing features in the (A) bowel-, (B) breast-, and (C) lung-specific prediction models. BRAF, B-Raf proto-oncogene; EGFR, epidermal growth factor receptor; ER, estrogen receptor; HER2, human epidermal growth factor receptor type 2; KRAS, Kirsten rat sarcoma viral oncogene homolog; NCC, National Cancer Center (Japan) OncoPanel; PD-L1, programmed death-ligand 1; PgR, progesterone receptor; SHAP, SHapley Additive exPlanations.
Figure 4
Figure 4
Construction and analyses of the AYA-specific prediction model. (A) Distribution of the number of identified drugs with evidence level A or B in the AYA cohort. (B) The receiver operating characteristic curve of the constructed model for the holdout test set. (C) SHAP summary plot of the most contributing features in the AYA-specific prediction model. AUROC, area under the receiver operating characteristic curve; AYA, adolescent and young adult.

Similar articles

Cited by

References

    1. Mateo J., Steuten L., Aftimos P., et al. Delivering precision oncology to patients with cancer. Nat Med. 2022;28:658–665. - PubMed
    1. Cobain E.F., Wu Y.M., Vats P., et al. Assessment of clinical benefit of integrative genomic profiling in advanced solid tumors. JAMA Oncol. 2021;7:525–533. - PMC - PubMed
    1. Horak P., Heining C., Kreutzfeldt S., et al. Comprehensive genomic and transcriptomic analysis for guiding therapeutic decisions in patients with rare cancers. Cancer Discov. 2021;11:2780–2795. - PubMed
    1. Heitzer E., Haque I.S., Roberts C.E.S., Speicher M.R. Current and future perspectives of liquid biopsies in genomics-driven oncology. Nat Rev Genet. 2019;20:71–88. - PubMed
    1. Siravegna G., Marsoni S., Siena S., Bardelli A. Integrating liquid biopsies into the management of cancer. Nat Rev Clin Oncol. 2017;14:531–548. - PubMed