Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jan-Dec:28:10732748211044678.
doi: 10.1177/10732748211044678.

Feature Selection is Critical for 2-Year Prognosis in Advanced Stage High Grade Serous Ovarian Cancer by Using Machine Learning

Affiliations

Feature Selection is Critical for 2-Year Prognosis in Advanced Stage High Grade Serous Ovarian Cancer by Using Machine Learning

Alexandros Laios et al. Cancer Control. 2021 Jan-Dec.

Abstract

Introduction: Accurate prediction of patient prognosis can be especially useful for the selection of best treatment protocols. Machine Learning can serve this purpose by making predictions based upon generalizable clinical patterns embedded within learning datasets. We designed a study to support the feature selection for the 2-year prognostic period and compared the performance of several Machine Learning prediction algorithms for accurate 2-year prognosis estimation in advanced-stage high grade serous ovarian cancer (HGSOC) patients.

Methods: The prognosis estimation was formulated as a binary classification problem. Dataset was split into training and test cohorts with repeated random sampling until there was no significant difference (p = 0.20) between the two cohorts. A ten-fold cross-validation was applied. Various state-of-the-art supervised classifiers were used. For feature selection, in addition to the exhaustive search for the best combination of features, we used the-chi square test of independence and the MRMR method.

Results: Two hundred nine patients were identified. The model's mean prediction accuracy reached 73%. We demonstrated that Support-Vector-Machine and Ensemble Subspace Discriminant algorithms outperformed Logistic Regression in accuracy indices. The probability of achieving a cancer-free state was maximised with a combination of primary cytoreduction, good performance status and maximal surgical effort (AUC 0.63). Standard chemotherapy, performance status, tumour load and residual disease were consistently predictive of the mid-term overall survival (AUC 0.63-0.66). The model recall and precision were greater than 80%.

Conclusion: Machine Learning appears to be promising for accurate prognosis estimation. Appropriate feature selection is required when building an HGSOC model for 2-year prognosis prediction. We provide evidence as to what combination of prognosticators leads to the largest impact on the HGSOC 2-year prognosis.

Keywords: Machine Learning; clinical factor analysis; cytoreduction; ovarian cancer; predictive factors; prognosis estimation.

PubMed Disclaimer

Conflict of interest statement

Declaration of Conflicting Interests: The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figures

Figure 1.
Figure 1.
Workflow showing integration of ML algorithms to analyse comprehensive resource of clinical, radiological and surgical data for the development of prognostic ovarian cancer models. The framework for building the predictive ML model comprised 5 steps.
Figure 2.
Figure 2.
Cohort survival outcomes. Kaplan–Meier curves demonstrating (A) PFS and (B) OS analysed by complete and incomplete cytoreductive outcomes. (C) Stratification of residual disease according to intraperitoneal dissemination pattern. (D) Kaplan–Meier curves demonstrating OS according to IDP. Haematogenous metastases negatively affect OS, potentially highlighting difficulty to achieve complete cytoreduction (p:0.000).
Figure 3.
Figure 3.
Feature ranking graphs for 2-year PFS: (A) Univariate feature ranking for classification using chi-square tests. (B) Multivariate feature ranking using MRMR algorithm; feature ranking graphs for 2-year OS: (C) Univariate feature ranking for classification using chi-square tests. (D) Multivariate feature ranking using MRMR algorithm.
Figure 4.
Figure 4.
Example of a confusion matrix showing a) prediction accuracy for 2-year OS by use of (A) the SVM classifier with Quadratic Kernel (AUC: .66) (B) the k-NN (AUC: .63). The example shows that the prediction is more accurate for the negative class compared to the positive class.
Figure 5.
Figure 5.
Correlation heatmap of the features included in the ML models demonstrating the correlation amongst the features using a variation of Pearson’s R correlation coefficient. The colours in the heatmap represent the correlation coefficients. A weak correlation amongst features was demonstrated.
Figure 6.(
Figure 6.(
A) Feature ranking for PFS based on the Lasso method. (B) Feature ranking for OS based on the Lasso method. (C) Feature ranking for PFS based on the Elastic Nets method. (D) Feature ranking for OS based on the Elastic Nets method.

References

    1. Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;68:394-424. - PubMed
    1. Buechel M, Herzog TJ, Westin SN, Coleman RL, Monk BJ, Moore KN. Treatment of patients with recurrent epithelial ovarian cancer for whom platinum is still an option. Ann Oncol. 2019;30:721-732. - PMC - PubMed
    1. National Cancer Institute . SEER stat fact sheets: ovary cancer [online], ovary.html. 2015.
    1. van der Burg MEL, van Lent M, Buyse M, et al.. The effect of debulking surgery after induction chemotherapy on the prognosis in advanced epithelial ovarian cancer. N Engl J Med. 1995;332:629-634. - PubMed
    1. Querleu D, Planchamp F, Chiva L, et al.. European society of gynaecological oncology (ESGO) guidelines for ovarian cancer surgery. Int J Gynecol Canc. 2017;27:1534-1542. - PubMed