Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Oct 21:11:577537.
doi: 10.3389/fendo.2020.577537. eCollection 2020.

Machine Learning Algorithms for the Prediction of Central Lymph Node Metastasis in Patients With Papillary Thyroid Cancer

Affiliations

Machine Learning Algorithms for the Prediction of Central Lymph Node Metastasis in Patients With Papillary Thyroid Cancer

Yijun Wu et al. Front Endocrinol (Lausanne). .

Abstract

Background: Central lymph node metastasis (CLNM) occurs frequently in patients with papillary thyroid cancer (PTC), but performing prophylactic central lymph node dissection is still controversial. There are no reliable models for predicting CLNM. This study aimed to develop predictive models for CLNM by machine learning (ML) algorithms.

Methods: Patients with PTC who underwent initial thyroid resection at our hospital between January 2018 and December 2019 were enrolled. A total of 22 variables, including clinical characteristics and ultrasonography (US) features, were used for conventional univariate and multivariate analysis and to construct ML-based models. A 5-fold cross validation strategy was used for validation and a feature selection approach was applied to identify risk factors.

Results: The areas under the receiver operating characteristic curve (AUC) of 7 models ranged from 0.680 to 0.731. All models performed significantly better than US (AUC=0.623) in predicting CLNM (P<0.05). In decision curve, most of the models also performed better than US. The gradient boosting decision tree model with 7 variables was identified as the best model because of its best performance in both ROC (AUC=0.731) and decision curves. Based on multivariate analysis and feature selection, young age, male sex, low serum thyroid peroxidase antibody and US features such as suspected lymph nodes, microcalcification and tumor size > 1.1 cm were the most contributing predictors for CLNM.

Conclusions: It is feasible to develop predictive models of CLNM in PTC patients by incorporating clinical characteristics and US features. The ML algorithm may be a useful tool for the prediction of lymph node metastasis in thyroid cancer.

Keywords: central lymph node metastasis; cross-validation; feature selection; machine learning; papillary thyroid cancer.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Receiver operating characteristic (ROC) curves of predictive models based on machine learning algorithms. US, ultrasonography; GBDT, gradient boosting decision tree; RFC, random forest classifier; AdaBoost, adaptive boosting; ANN, artificial neural network; MNB, multinomial naïve Bayes; XGBoost, extreme gradient boosting; DT, decision tree.
Figure 2
Figure 2
Decision curve for predictive models based on machine learning algorithms. US, ultrasonography; GBDT, gradient boosting decision tree; RFC, random forest classifier; AdaBoost, adaptive boosting; ANN, artificial neural network; MNB, multinomial naïve Bayes; XGBoost, extreme gradient boosting; DT, decision tree.
Figure 3
Figure 3
Ranks of the top 10 variables for the prediction of central lymph node metastasis. Variables were ranked using a classifier-specific importance evaluator based on machine learning algorithms. The variables are ordered according to the mean ranking of three potential models, which were GBDT, RFC and ANN. A lower rank represents more predictive importance. For example, age was ranked 1st, 3rd, and 10th in GBDT, ANN, and RFC, respectively. LNs, lymph nodes; Micro-Cal, microcalcification; TPO-Ab, thyroid peroxidase antibody; TSH, thyroid stimulating hormone; Ir. shape, irregular shape; Cap. invasion, capsular invasion; Hypo-echo, hypoechogenicity; GBDT, gradient boosting decision tree; RFC, random forest classifier; ANN, artificial neural network.
Figure 4
Figure 4
Predictive performance of the gradient boosting decision tree (GBDT) model with different numbers of variables. The AUC was the highest (0.731) with seven variables.

References

    1. Wiltshire JJ, Drake TM, Uttley L, Balasubramanian SP. Systematic Review of Trends in the Incidence Rates of Thyroid Cancer. Thyroid (2016) 26(11):1541–52. 10.1089/thy.2016.0100 - DOI - PubMed
    1. Morris LGT, Sikora AG, Tosteson TD, Davies L. The increasing incidence of thyroid cancer: the influence of access to care. Thyroid (2013) 23(7):885–91. 10.1089/thy.2013.0045 - DOI - PMC - PubMed
    1. Ferrari SM, Fallahi P, Ruffilli I, Elia G, Ragusa F, Paparo SR, et al. Molecular testing in the diagnosis of differentiated thyroid carcinomas. Gland Surg (2018) 7(Suppl 1):S19–29. 10.21037/gs.2017.11.07 - DOI - PMC - PubMed
    1. La Vecchia C, Malvezzi M, Bosetti C, Garavello W, Bertuccio P, Levi F, et al. cancer: Thyroid cancer mortality and incidence: a global overview. Int J Cancer (2015) 136(9):2187–95. 10.1002/ijc.29251 - DOI - PubMed
    1. Cabanillas ME, McFadden DG, Durante Thyroid cancer. Lancet (2016) 388(10061):2783–95. 10.1016/s0140-6736(16)30172-6 - DOI - PubMed