Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2025 Jun 7;25(1):2137.
doi: 10.1186/s12889-025-23119-y.

Comparison of spatial prediction models from Machine Learning of cholangiocarcinoma incidence in Thailand

Affiliations
Comparative Study

Comparison of spatial prediction models from Machine Learning of cholangiocarcinoma incidence in Thailand

Oraya Sahat et al. BMC Public Health. .

Abstract

Background: Cholangiocarcinoma (CCA) poses a significant public health challenge in Thailand, with notably high incidence rates. This study aimed to compare the performance of spatial prediction models using Machine Learning techniques to analyze the occurrence of CCA across Thailand.

Methods: This retrospective cohort study analyzed CCA cases from four population-based cancer registries in Thailand, diagnosed between January 1, 2012, and December 31, 2021. The study employed Machine Learning models (Linear Regression, Random Forest, Neural Network, and Extreme Gradient Boosting (XGBoost)) to predict Age-Standardized Rates (ASR) of CCA based on spatial variables. Model performance was evaluated using Root Mean Square Error (RMSE) and R2 with 70:30 train-test validation.

Results: The study included 6,379 CCA cases, with a male predominance (4,075 cases; 63.9%) and a mean age of 66.2 years (standard deviation = 11.1 years). The northeastern region accounted for most of the cases (3,898 cases; 61.1%). The overall ASR of CCA was 8.9 per 100,000 person-years (95% CI: 8.7 to 9.2), with the northeastern region showing the highest incidence (ASR = 13.4 per 100,000 person-years; 95% CI: 12.9 to 13.8). In the overall dataset, the Random Forest model demonstrated better prediction performance in both the training (R2 = 72.07%) and testing datasets (R2 = 71.66%). Regional variations in model performance were observed, with Random Forest performing best in the northern, northeastern regions, while XGBoost excelled in the central and southern regions. The most important spatial predictors for CCA were elevation and distance from water sources.

Conclusion: The Random Forest model demonstrated the highest efficiency in predicting CCA incidence rates in Thailand, though predictive performance varied across regions. Spatial factors effectively predicted ASR of CCA, providing valuable insights for national-level disease surveillance and targeted public health interventions. These findings support the development of region-specific approaches for CCA control using spatial epidemiology and machine learning techniques.

Keywords: Cholangiocarcinoma; Machine Learning; Population-based cancer registries; Prediction Models; Spatial Predictions; Thailand.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: This study utilized secondary data from four PBCRs, which did not involve the collection of individuals’ identifying information. Therefore, individual informed consent was not required. This study received ethical approval from the Human Research Ethics Committees of all four data sources: Lampang Cancer Hospital (No. 10/2567), Lop Buri Cancer Hospital (No. LEC 6647), Khon Kaen University, where the consideration of human research ethics is in accordance with the Helsinki Declaration (No. HE671027), and Surat Thani Cancer Hospital (No. SCH_EC_01/2567). Consent for publication: Not applicable. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Scatter plots comparing predicted versus observed CCA rates across Thailand by Machine Learning models. LR, Linear Regression; RF, Random Forest; NN, Neural Network; XGBoost, Extreme Gradient Boosting, b Scatter plots comparing predicted versus observed CCA rates in Northern Thailand by Machine Learning models. LR, Linear Regression; RF, Random Forest; NN, Neural Network; XGBoost, Extreme Gradient Boosting. c Scatter plots comparing predicted versus observed CCA rates in Central Thailand by Machine Learning models. LR, Linear Regression; RF, Random Forest; NN, Neural Network; XGBoost, Extreme Gradient Boosting. d Scatter plots comparing predicted versus observed CCA rates in Northeastern Thailand by Machine Learning models. LR, Linear Regression; RF, Random Forest; NN, Neural Network; XGBoost, Extreme Gradient Boosting. e Scatter plots comparing predicted versus observed CCA rates in Southern Thailand by Machine Learning models. LR, Linear Regression; RF, Random Forest; NN, Neural Network; XGBoost, Extreme Gradient Boosting
Fig. 2
Fig. 2
Variable importance analysis showing relative impact of spatial predictors on CCA incidence

Similar articles

References

    1. Banales JM, Marin JJG, Lamarca A, Rodrigues PM, Khan SA, Roberts LR, et al. Cholangiocarcinoma 2020: the next horizon in mechanisms and management. Nat Rev Gastroenterol Hepatol. 2020;17(9):557–88. 10.1038/s41575-020-0310-z. - PMC - PubMed
    1. Sriamporn S, Pisani P, Pipitgool V, Suwanrungruang K, Kamsa-ard S, Parkin DM. Prevalence of Opisthorchis viverrini infection and incidence of cholangiocarcinoma in Khon Kaen. Northeast Thailand Trop Med Int Health. 2004;9(5):588–94. 10.1111/j.1365-3156.2004.01234.x. - PubMed
    1. Kamsa-Ard S, Luvira V, Pugkhem A, Luvira V, Thinkhamrop B, Suwanrungruang K, et al. Association between praziquantel treatment and cholangiocarcinoma: a hospital-based matched case-control study. BMC Cancer. 2015;15:776. 10.1186/s12885-015-1788-6. - PMC - PubMed
    1. Shin H-R, Oh J-K, Masuyer E, Curado M-P, Bouvard V, Fang Y-Y, et al. Epidemiology of cholangiocarcinoma: an update focusing on risk factors. Cancer Sci. 2010;101:579–85. 10.1111/j.1349-7006.2009.01458.x. - PMC - PubMed
    1. Honjo S, Srivatanakul P, Sriplung H, Kikukawa H, Hanai S, Uchida K, et al. Genetic and environmental determinants of risk for cholangiocarcinoma via Opisthorchis viverrini in a densely infested area in Nakhon Phanom, northeast Thailand. Int J Cancer. 2005;117(5):854–60. 10.1002/ijc.21146. - PubMed

Publication types

LinkOut - more resources