Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Feb 28;14(2):706-716.
doi: 10.21037/tcr-24-1672. Epub 2025 Feb 18.

Construction and validation of machine learning models for predicting lymph node metastasis in cutaneous malignant melanoma: a large population-based study

Affiliations

Construction and validation of machine learning models for predicting lymph node metastasis in cutaneous malignant melanoma: a large population-based study

Ling-Feng Lan et al. Transl Cancer Res. .

Abstract

Background: Lymph node status is essential for determining the prognosis of cutaneous malignant melanoma (CMM). This study aimed to develop a machine learning (ML) model for predicting lymph node metastases (LNM) in CMM.

Methods: We gathered data on 6,196 patients from the Surveillance, Epidemiology, and End Results (SEER) database, including known clinicopathologic variables, using six ML algorithms, including logistic regression (LR), support vector machine (SVM), Complement Naive Bayes (CNB), Extreme Gradient Boosting (XGBoost), RandomForest (RF), and k-nearest neighbor algorithm (kNN), to predict the presence of LNM in CMM. Subsequently, we established prediction models. The utilization of the adaptive synthetic (ADASYN) method served to address the challenge posed by imbalanced data. We assessed prediction model performance in terms of average precision (AP), sensitivity, specificity, accuracy, F1 score, precision-recall curves, calibration plots, and decision curve analysis (DCA). Furthermore, employing SHapley Additive exPlanation (SHAP) analysis resulted in the creation of visualized explanations tailored to individual patients.

Results: Among the 6,196 CMM cases, 19.9% (n=1,234) presented with LNM. The XGBoost model showed the best predictive performance when compared with the other algorithms (AP of 0.805). XGBoost showed that age and Breslow thickness were the two most important factors related to LNM.

Conclusions: The XGBoost model predicted LNM of CMM with a high level of precision. We hope that this model could assist surgeons in accurately evaluating surgical approaches and determining the extent of surgery, while also guiding the subsequent adjuvant therapies, thereby improving the prognosis of patients.

Keywords: Cutaneous malignant melanoma (CMM); Surveillance, Epidemiology, and End Results (SEER); lymph node metastasis (LNM); machine learning (ML); shapley additive explanation (SHAP).

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tcr.amegroups.com/article/view/10.21037/tcr-24-1672/coif). The authors have no conflicts of interest to declare.

Figures

Figure 1
Figure 1
Correlation between factors. The depth of color indicates the magnitude of correlation.
Figure 2
Figure 2
Evaluation of the prediction models for lymph node metastases in cutaneous malignant melanoma for the training set (A), and validation set (B). The average precision-recall curves, indicating the trade-off between precision and recall. PR, precision-recall; AP, average precision; XGBoost, Extreme Gradient Boosting; LR, logistic regression; RF, RandomForest; CNB, Complement Naive Bayes; SVM, support vector machine; kNN, the k-nearest neighbor algorithm; CI, confidence interval.
Figure 3
Figure 3
Examples of calibration plots (Brier Score) for predicting lymph node metastases with various models: XGBoost, LR, RF, CNB, SVM, and kNN. The 45° straight line on each graph represents the perfect match between the observed (y-axis) and predicted (x-axis) survival probabilities. A closer distance between two curves indicates greater accuracy. XGBoost, Extreme Gradient Boosting; LR, logistic regression; RF, RandomForest; CNB, Complement Naive Bayes; SVM, support vector machine; kNN, the k-nearest neighbor algorithm; CI, confidence interval.
Figure 4
Figure 4
Decision curves of various models: XGBoost, LR, RF, CNB, SVM, and kNN. XGBoost, Extreme Gradient Boosting; LR, logistic regression; SVM, support vector machine; CNB, Complement Naive Bayes; RF, RandomForest; kNN, the k-nearest neighbor algorithm.
Figure 5
Figure 5
ROC curves of Extreme Gradient Boosting for the training (A), validation (B), and test (C) set. ROC, receiver operating characteristic; AUC, area under the curve; CI, confidence interval.
Figure 6
Figure 6
Summary plots for SHAP values. For each feature, one point corresponds to a single patient. A point’s position along the x-axis represents the impact that feature had on the model’s output for that specific patient. The redder the color indicates that the value is greater, and the bluer the color indicates that the value is smaller. Features are arranged along the y-axis based on their importance, which is given by the mean of their absolute Shapley values. The higher the feature is positioned in the plot, the more important it is for the model. SHAP, SHapley Additive exPlanation.

Similar articles

References

    1. Melanoma of the Skin Statistics American Cancer Society—Cancer Facts and Statistics. American Cancer Society 2023. Available online: www.cancer.org/cancer/melanoma-skin-cancer/about/key-statistics.html
    1. Siegel RL, Miller KD, Wagle NS, et al. Cancer statistics, 2023. CA Cancer J Clin 2023;73:17-48. 10.3322/caac.21763 - DOI - PubMed
    1. Skin cancer World Cancer Research Fund International. Available online: https://www.wcrf.org/dietandcancer/skin-cancer/
    1. SEER*Explorer. An interactive website for SEER cancer statistics Surveillance Research Program, National Cancer Institute. 2023. Available online: https://seer.cancer.gov/explorer/
    1. Faries MB, Han D, Reintgen M, et al. Lymph node metastasis in melanoma: a debate on the significance of nodal metastases, conditional survival analysis and clinical trials. Clin Exp Metastasis 2018;35:431-42. 10.1007/s10585-018-9898-6 - DOI - PMC - PubMed

LinkOut - more resources