Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov 4;22(1):269.
doi: 10.1186/s12902-022-01186-1.

Exploring risk factors for cervical lymph node metastasis in papillary thyroid microcarcinoma: construction of a novel population-based predictive model

Affiliations

Exploring risk factors for cervical lymph node metastasis in papillary thyroid microcarcinoma: construction of a novel population-based predictive model

Yanling Huang et al. BMC Endocr Disord. .

Abstract

Background: Machine learning was a highly effective tool in model construction. We aim to establish a machine learning-based predictive model for predicting the cervical lymph node metastasis (LNM) in papillary thyroid microcarcinoma (PTMC).

Methods: We obtained data on PTMC from the SEER database, including 10 demographic and clinicopathological characteristics. Univariate and multivariate logistic regression (LR) analyses were applied to screen the risk factors for cervical LNM in PTMC. Risk factors with P < 0.05 in multivariate LR analysis were used as modeling variables. Five different machine learning (ML) algorithms including extreme gradient boosting (XGBoost), random forest (RF), adaptive boosting (AdaBoost), gaussian naive bayes (GNB) and multi-layer perceptron (MLP) and traditional regression analysis were used to construct the prediction model. Finally, the area under the receiver operating characteristic (AUROC) curve was used to compare the model performance.

Results: Through univariate and multivariate LR analysis, we screened out 9 independent risk factors most closely associated with cervical LNM in PTMC, including age, sex, race, marital status, region, histology, tumor size, and extrathyroidal extension (ETE) and multifocality. We used these risk factors to build an ML prediction model, in which the AUROC value of the XGBoost algorithm was higher than the other 4 ML algorithms and was the best ML model. We optimized the XGBoost algorithm through 10-fold cross-validation, and its best performance on the training set (AUROC: 0.809, 95%CI 0.800-0.818) was better than traditional LR analysis (AUROC: 0.780, 95%CI 0.772-0.787).

Conclusions: ML algorithms have good predictive performance, especially the XGBoost algorithm. With the continuous development of artificial intelligence, ML algorithms have broad prospects in clinical prognosis prediction.

Keywords: Conventional regression model; Machine learning; Papillary thyroid microcarcinoma cervical lymph node metastasis; Prediction model; Risk factors.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Flow chart of patients selection and study design
Fig. 2
Fig. 2
Pearson correlation test for variables
Fig. 3
Fig. 3
ROC curves, forest plot and nomogram of the LR model for cervical LNM in PTMC. Note: A shows the ROC curves of the multivariate LR. B shows the forest plot of the multivariate LR model. C shows the risk nomogram of the multivariate LR model
Fig. 4
Fig. 4
Model performance evaluation of different ML methods. Note: A showed the ROC curve of 5 different ML models in training set; B showed the ROC curve of 5 different ML models in validation set; C showed the AUC score forest plot of each model; D showed the reliability curve of each model
Fig. 5
Fig. 5
Optimization and visualization of the XGBoost model. Note: A, B and D displayed the ROC curve of the train, validation and test of the XGBoost model by 10-fold cross-validation. C showed the learning curve of the XGBoost classifier. E showed the reliability curve of XGBoost model. F showed the summary plots of SHAP values for the XGBoost model. For each feature, one point corresponds to a single patient. A point’s position along the x axis represented the impact that feature had on the model’s output for that specific patient. Features were arranged along the y axis based on their importance, which was given by the mean of their absolute Shapley values. The higher the feature was positioned in the plot, the more important it was for the model

Similar articles

Cited by

References

    1. Vaccarella S, Dal Maso L, Laversanne M, Bray F, Plummer M, Franceschi S. The impact of diagnostic changes on the rise in thyroid cancer incidence: a population-based study in selected high-resource countries. Thyroid. 2015;25(10):1127–1136. - PubMed
    1. Ho AS, Davies L, Nixon IJ, Palmer FL, Wang LY, Patel SG, et al. Increasing diagnosis of subclinical thyroid cancers leads to spurious improvements in survival rates. Cancer. 2015;121(11):1793–1799. - PMC - PubMed
    1. Al-Qurayshi Z, Nilubol N, Tufano RP, Kandil E. Wolf in Sheep's clothing: papillary thyroid microcarcinoma in the US. J Am Coll Surg. 2020;230(4):484–491. - PMC - PubMed
    1. Lee J, Song Y, Soh EY. Central lymph node metastasis is an important prognostic factor in patients with papillary thyroid microcarcinoma. J Korean Med Sci. 2014;29(1):48–52. - PMC - PubMed
    1. Zhu J, Zheng J, Li L, Huang R, Ren H, Wang D, et al. Application of machine learning algorithms to predict central lymph node metastasis in T1-T2, non-invasive, and clinically node negative papillary thyroid carcinoma. Front Med. 2021;8:635771. - PMC - PubMed

Supplementary concepts