Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jan 9;15(1):1532.
doi: 10.1038/s41598-025-85963-7.

Using XBGoost, an interpretable machine learning model, for diagnosing prostate cancer in patients with PSA < 20 ng/ml based on the PSAMR indicator

Affiliations

Using XBGoost, an interpretable machine learning model, for diagnosing prostate cancer in patients with PSA < 20 ng/ml based on the PSAMR indicator

Dengke Li et al. Sci Rep. .

Abstract

To create a diagnostic tool before biopsy for patients with prostate-specific antigen (PSA) levels < 20 ng/ml to minimize prostate biopsy-related discomfort and risks. Data from 655 patients who underwent transperineal prostate biopsy at the First Affiliated Hospital of Wannan Medical College from July 2021 to January 2023 were collected and analyzed. After applying the Synthetic Minority Over-sampling TEchnique class balancing on the training set, multiple machine learning models were constructed by using the Least Absolute Shrinkage and Selection Operator (LASSO) feature selection to identify the significant variables. The best-performing model was selected and evaluated through tenfold cross-validation to ensure interpretability. Finally, the performance was assessed using the test set data for validation. The age, prostate-specific antigen mass ratio (PSAMR), Prostate Imaging-Reporting and Data System, and prostate volume were selected as the variables for model construction based on the LASSO regression. The receiver operating characteristic (ROC) results for multiple models in the validation set were as follows: XGBoost: 0.93 (0.88-0.97); logistic: 0.89 (0.83-0.95); LightGBM: 0.87 (0.80-0.93); AdaBoost: 0.90 (0.85-0.96); GNB: 0.88 (0.82-0.95); CNB: 0.79 (0.71-0.87); MLP: 0.78 (0.69-0.86); and Support Vector Machine: 0.81 (0.73-0.89). XGBoost was selected as the best model and reconstructed with tenfold cross-validation on the training data, resulting in the following ROC scores: training set 0.995 (0.991-0.999), validation set 0.945 (0.885-0.997 ), and test set 0.920 (0.868-0.972). The Kolmogorov-Smirnov curve, calibration curve and learning curve yielded positive results; The decision curve demonstrates that patients with threshold probabilities ranging from 10 to 95% can benefit from this model. We developed an XGBoost machine learning model based on the PSAMR indicator and interpreted it using the SHapley Additive exPlanations method. The model offered a high-performance non-invasive technique to diagnose prostate cancer in patients with PSA levels < 20 ng/ml.

Keywords: PSAMR; Prostate cancer; SHAP; SMOTE; XGBoost machine learning model.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declared no potential conflicts of interest in terms of the research, authorship, and/or publication of this article. Ethics approval: This study adhered to the Helsinki Declaration.Approval for this retrospective study was granted by the Ethics Review Committee at the First Affiliated Hospital of Wannan Medical College, with an exemption from requirement for informed consent (Project Title: Construction and Promotion Research of Innovative System for Radical Prostatectomy in Prostate Cancer, Ethical Approval No. 2022–04).

Figures

Fig. 1
Fig. 1
Research process.
Fig. 2
Fig. 2
LASSO regression screening. A is the coefficient profile plot; B is the cross-validation curve. LASSO: Least Absolute Shrinkage and Selection Operator.
Fig. 3
Fig. 3
Multi-model ROC curve. A is the multi-model ROC curve for the training set, with XGBoost as the best model and an AUC value of 0.97 (95% CI 0.95–0.98); B is the multi-model ROC curve for the validation set, with XGBoost as the best model and an AUC value of 0.93 (95% CI 0.88–0.97). ROC: receiver operating characteristic; AUC, area under the ROC curve.
Fig. 4
Fig. 4
XGBoost machine learning model for the tenfold cross-validation ROC curves. A is the ROC curve for the training set, with an average AUC of 0.995 (0.991–0.999); B is the ROC curve for the validation set, with an average AUC of 0.945 (0.885–0.997); and C is the ROC curve for the test set, with an AUC of 0.920 (0.868–0.972). ROC: receiver operating characteristic; AUC, area under the ROC curve.
Fig. 5
Fig. 5
A: KS(Kolmogorov–Smirnov) Statistic Plot; statistic value: 0.698. B: Learning curve; the red line represents the learning process of the training set, whereas the blue line represents the learning process of the validation set. No obvious underfitting or overfitting phenomena are observed. C: Calibration curve. D: Decision curve: The vertical red line is marked as the threshold probability range.
Fig. 6
Fig. 6
SHAP(Shapley Additive exPlanations) method. A: SHAP feature attribution; each line represents a feature, and the x-axis represents the SHAP value. B: Variable Importance Ranking; the longer the blue bar, the greater the variable contribution. C and D: Model Prediction Results; a random sample is acquired from the test set and input into the prediction model. Based on the corresponding feature values, the output is a probability of approximately 26% and 89% for the diagnosed patient.

References

    1. Sekhoacha, M. et al. Prostate cancer review: genetics diagnosis treatment options and alternative approaches. Molecules (Basel, Switzerland)10.3390/molecules27175730 (2022). - PMC - PubMed
    1. Bergengren, O. et al. 2022 Update on prostate cancer epidemiology and risk factors-a systematic review. Eur. Urol.84(2), 191–206. 10.1016/j.eururo.2023.04.021 (2023). - PMC - PubMed
    1. Rebello, R. J. et al. Prostate cancer. Nat. Rev. Dis. Primers7(1), 9. 10.1038/s41572-020-00243-0 (2021). - PubMed
    1. Williams, I. S. et al. Modern paradigms for prostate cancer detection and management. Med. J. Aust.217(8), 424–433. 10.5694/mja2.51722 (2022). - PMC - PubMed
    1. Sahin, T. K., Rizzo, A., Aksoy, S. & Guven, D. C. Prognostic Significance of the Royal Marsden Hospital (RMH) Score in Patients with Cancer: A Systematic Review and Meta-Analysis. Cancers10.3390/cancers16101835 (2024). - PMC - PubMed

Substances