Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Feb 16:13:810219.
doi: 10.3389/fendo.2022.810219. eCollection 2022.

Machine Learning for Outcome Prediction in First-Line Surgery of Prolactinomas

Affiliations

Machine Learning for Outcome Prediction in First-Line Surgery of Prolactinomas

Markus Huber et al. Front Endocrinol (Lausanne). .

Abstract

Background: First-line surgery for prolactinomas has gained increasing acceptance, but the indication still remains controversial. Thus, accurate prediction of unfavorable outcomes after upfront surgery in prolactinoma patients is critical for the triage of therapy and for interdisciplinary decision-making.

Objective: To evaluate whether contemporary machine learning (ML) methods can facilitate this crucial prediction task in a large cohort of prolactinoma patients with first-line surgery, we investigated the performance of various classes of supervised classification algorithms. The primary endpoint was ML-applied risk prediction of long-term dopamine agonist (DA) dependency. The secondary outcome was the prediction of the early and long-term control of hyperprolactinemia.

Methods: By jointly examining two independent performance metrics - the area under the receiver operating characteristic (AUROC) and the Matthews correlation coefficient (MCC) - in combination with a stacked super learner, we present a novel perspective on how to assess and compare the discrimination capacity of a set of binary classifiers.

Results: We demonstrate that for upfront surgery in prolactinoma patients there are not a one-algorithm-fits-all solution in outcome prediction: different algorithms perform best for different time points and different outcomes parameters. In addition, ML classifiers outperform logistic regression in both performance metrics in our cohort when predicting the primary outcome at long-term follow-up and secondary outcome at early follow-up, thus provide an added benefit in risk prediction modeling. In such a setting, the stacking framework of combining the predictions of individual base learners in a so-called super learner offers great potential: the super learner exhibits very good prediction skill for the primary outcome (AUROC: mean 0.9, 95% CI: 0.92 - 1.00; MCC: 0.85, 95% CI: 0.60 - 1.00). In contrast, predicting control of hyperprolactinemia is challenging, in particular in terms of early follow-up (AUROC: 0.69, 95% CI: 0.50 - 0.83) vs. long-term follow-up (AUROC: 0.80, 95% CI: 0.58 - 0.97). It is of clinical importance that baseline prolactin levels are by far the most important outcome predictor at early follow-up, whereas remissions at 30 days dominate the ML prediction skill for DA-dependency over the long-term.

Conclusions: This study highlights the performance benefits of combining a diverse set of classification algorithms to predict the outcome of first-line surgery in prolactinoma patients. We demonstrate the added benefit of considering two performance metrics jointly to assess the discrimination capacity of a diverse set of classifiers.

Keywords: dopamine agonists; long-term outcome; machine learning; prediction modeling; primary surgical therapy; prolactinoma.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Hyperparameter tuning in our set of machine learning classifiers. The impact of varying the default values of a single hyperparameter on the area under the curve (AUROC) is illustrated for a selection of hyperparameters in each algorithm (shown on the ordinate). Each hyperparameter is sampled 50 times and its performance is assessed within a repeated cross-validation sampling (three-fold, 4-repeats), resulting in an AUROC distribution, which is illustrated with a box and whiskers plot. The outcome was dependence on dopamine agonists at long-term follow-up. For comparison, the range of AUROC values derived using the default hyperparameter settings are shown as DEFAULT in each panel. Due to the repeated cross-validation sampling, the default hyperparameter settings also feature AUROC distributions, despite using only a fixed set of hyperparameters.
Figure 2
Figure 2
Relationship between two performance metrics in a set of supervised classification algorithms resulting from randomly sampling two hyperparameters in each algorithm (N=500 samples). The area under the curve (AUROC) performance indicator is shown on the abscissa, whereas the corresponding value for the Matthews correlation coefficient (MCC) is shown on the ordinate. The outcomes are (A) dependency on DA on long-term follow-up and (B) successful control of hyperprolactinemia at early follow-up. For illustration purposes, a Locally Weighted Scatterplot Smoothing (LOESS) curves with associated 95% confidence intervals are shown for each classification algorithm.
Figure 3
Figure 3
Area under the curve (AUROC) and Matthews correlation coefficient (MCC) values for the outcomes at early- and long-term follow-up. Median and 95% confidence intervals are shown, where the latter were derived in a repeated cross-validation sampling (three-fold, 100-repeats). For each machine learning algorithm, two influential hyperparameters (refer to Figure 1 ) were sampled 100 times and the hyperparameters settings resulting in the best AUROC performance were selected.
Figure 4
Figure 4
Importance of the available set of variables in predicting early and long-term outcome. The variable importance metric is based on a permutation approach, where the impact of perturbing the values of a given predictor on a particular performance metric [in this case: area under the curve (AUROC)] is assessed: the larger the decrease in the AUROC metric, the more important a predictor is considered. The variable importance is assessed for each classification algorithm with optimized hyperparameters, and the importance values for each predictor are simply stacked upon each other to illustrate the overall importance of a particular predictor and to visualize the inter-algorithm agreement in the assessment of the importance of a single predictor.

References

    1. Colao A, Di Sarno A, Guerra E, Pivonello R, Cappabianca P, Caranci F, et al. . Predictors of Remission of Hyperprolactinaemia After Long-Term Withdrawal of Cabergoline Therapy. Clin Endocrinol (2007) 67(3):426–33. doi: 10.1111/j.1365-2265.2007.02905.x - DOI - PubMed
    1. Kars M, Souverein PC, Herings RMC, Romijn JA, Vandenbroucke JP, de Boer A, et al. . Estimated Age- and Sex-Specific Incidence and Prevalence of Dopamine Agonist-Treated Hyperprolactinemia. J Clin Endocrinol Metab (2009) 94(8):2729–34. doi: 10.1210/jc.2009-0177 - DOI - PubMed
    1. Levy A. Pituitary Disease: Presentation, Diagnosis, and Management. J Neurol Neurosurg Psychiatry (2004) 75:47–52. doi: 10.1136/jnnp.2004.045740 - DOI - PMC - PubMed
    1. Dekkers OM, Lagro J, Burman P, Jørgensen JO, Romijn JA, Pereira AM. Recurrence of Hyperprolactinemia After Withdrawal of Dopamine Agonists: Systematic Review and Meta-Analysis. J Clin Endocrinol Metab (2010) 95(1):43–51. doi: 10.1210/jc.2009-1238 - DOI - PubMed
    1. Herring N, Szmigielski C, Becher H, Karavitaki N, Wass JAH. Valvular Heart Disease and the Use of Cabergoline for the Treatment of Prolactinoma. Clin Endocrinol (2009) 70(1):104–8. doi: 10.1111/j.1365-2265.2008.03458.x - DOI - PubMed

Substances