Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Multicenter Study
. 2022 Dec;11(23):4469-4478.
doi: 10.1002/cam4.4800. Epub 2022 May 2.

Prediction of lung cancer risk in Chinese population with genetic-environment factor using extreme gradient boosting

Affiliations
Multicenter Study

Prediction of lung cancer risk in Chinese population with genetic-environment factor using extreme gradient boosting

Yutao Li et al. Cancer Med. 2022 Dec.

Abstract

Background: Detecting early-stage lung cancer is critical to reduce the lung cancer mortality rate; however, existing models based on germline variants perform poorly, and new models are needed. This study aimed to use extreme gradient boosting to develop a predictive model for the early diagnosis of lung cancer in a multicenter case-control study.

Materials and methods: A total of 974 cases and 1005 controls in Shanghai and Taizhou were recruited, and 61 single nucleotide polymorphisms (SNPs) were genotyped. Multivariate logistic regression was used to calculate the association between signal SNPs and lung cancer risk. Logistic regression (LR) and extreme gradient boosting (XGBoost) algorithms, a large-scale machine learning algorithm, were adopted to build the lung cancer risk model. In both models, 10-fold cross-validation was performed, and model predictive performance was evaluated by the area under the curve (AUC).

Results: After FDR adjustment, TYMS rs3819102 and BAG6 rs1077393 were significantly associated with lung cancer risk (p < 0.05). For lung cancer risk prediction, the model predicted only with epidemiology attained an AUC of 0.703 for LR and 0.744 for XGBoost. Compared with the LR model predicted only with epidemiology, further adding SNPs and applying XGBoost increased the AUC to 0.759 (p < 0.001) in the XGBoost model. BAG6 rs1077393 was the most important predictor among all SNPs in the lung cancer prediction XGBoost model, followed by TERT rs2735845 and CAMKK1 rs7214723. Further stratification in lung adenocarcinoma (ADC) showed a significantly elevated performance from 0.639 to 0.699 (p = 0.009) when applying XGBoost and adding SNPs to the model, while the best model for lung squamous cell carcinoma (SCC) prediction was the LR model predicted with epidemiology and SNPs (AUC = 0.833), compared with the XGBoost model (AUC = 0.816).

Conclusion: Our lung cancer risk prediction models in the Chinese population have a strong predictive ability, especially for SCC. Adding SNPs and applying the XGBoost algorithm to the epidemiologic-based logistic regression risk prediction model significantly improves model performance.

Keywords: Chinese population; extreme gradient boosting; lung cancer; risk model; single nucleotide polymorphisms.

PubMed Disclaimer

Conflict of interest statement

No conflict of interest exits in the submission of this manuscript. All the authors have contributed to, read and approved the final manuscript for submission. This manuscript has not been submitted elsewhere.

Figures

FIGURE 1
FIGURE 1
Selection criteria of single nucleotide polymorphisms. CNKI, China National Knowledge Infrastructure; GWAS, Genome Wide Association Study; SNP, single nucleotide polymorphisms; MAF, minor allele frequency; OR, odds ratio

References

    1. Siegel RL, Miller KD, Jemal A. Cancer statistics, 2019. CA Cancer J Clin. 2019;69(1):7‐34. - PubMed
    1. Huo J, Shen C, Volk RJ, Shih YCT. Use of CT and chest radiography for lung cancer screening before and after publication of screening guidelines: intended and unintended uptake. JAMA Intern Med. 2017;177(3):439‐441. - PMC - PubMed
    1. Brenner DJ, Hall EJ. Computed tomography — an increasing source of radiation exposure. N Engl J Med. 2007;357(22):2277‐2284. - PubMed
    1. Jemal A, Fedewa SA. Lung cancer screening with low‐dose computed tomography in the United States‐2010 to 2015. JAMA Oncol. 2017;3(9):1278‐1281. - PMC - PubMed
    1. Marcus MW et al. Incorporating epistasis interaction of genetic susceptibility single nucleotide polymorphisms in a lung cancer risk prediction model. Int J Oncol. 2016;49(1):361‐370. - PMC - PubMed

Publication types

LinkOut - more resources