Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jun 3:11:1389325.
doi: 10.3389/fmolb.2024.1389325. eCollection 2024.

Integrating proteomics and explainable artificial intelligence: a comprehensive analysis of protein biomarkers for endometrial cancer diagnosis and prognosis

Affiliations

Integrating proteomics and explainable artificial intelligence: a comprehensive analysis of protein biomarkers for endometrial cancer diagnosis and prognosis

Seyma Yasar et al. Front Mol Biosci. .

Abstract

Endometrial cancer, which is the most common gynaecological cancer in women after breast, colorectal and lung cancer, can be diagnosed at an early stage. The first aim of this study is to classify age, tumor grade, myometrial invasion and tumor size, which play an important role in the diagnosis and prognosis of endometrial cancer, with machine learning methods combined with explainable artificial intelligence. 20 endometrial cancer patients proteomic data obtained from tumor biopsies taken from different regions of EC tissue were used. The data obtained were then classified according to age, tumor size, tumor grade and myometrial invasion. Then, by using three different machine learning methods, explainable artificial intelligence was applied to the model that best classifies these groups and possible protein biomarkers that can be used in endometrial prognosis were evaluated. The optimal model for age classification was XGBoost with AUC (98.8%), for tumor grade classification was XGBoost with AUC (98.6%), for myometrial invasion classification was LightGBM with AUC (95.1%), and finally for tumor size classification was XGBoost with AUC (94.8%). By combining the optimal models and the SHAP approach, possible protein biomarkers and their expressions were obtained for classification. Finally, EWRS1 protein was found to be common in three groups (age, myometrial invasion, tumor size). This article's findings indicate that models have been developed that can accurately classify factors including age, tumor grade, and myometrial invasion all of which are critical for determining the prognosis of endometrial cancer as well as potential protein biomarkers associated with these factors. Furthermore, we were able to provide an analysis of how the quantities of the proteins suggested as biomarkers varied throughout the classes by combining the SHAP values with these ideal models.

Keywords: biomarker; endometrium cancer; explainable artificial intelligence; machine learning; proteomic.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
(A) Global SHAP annotations for tumor size prediction of the XGBoost model. Horizontal positions reflect the effect of proteins on model output. Colors indicate whether protein is high (red) or low (blue) for a particular patient. A positive SHAP value indicates a positive contribution to the output, and a negative SHAP value indicates a negative contribution to the output. (B) Graphs of protein importance based on the mean SHAP values of the XGBoost model in predicting tumor size. The graph shows an order of importance for proteins according to their collective absolute SHAP values.
FIGURE 2
FIGURE 2
(A) Global SHAP annotations for myometrial invasion prediction of the LightGBM model. Horizontal positions reflect the effect of proteins on model output. Colors indicate whether protein is high (red) or low (blue) for a particular patient. A positive SHAP value indicates a positive contribution to the output, and a negative SHAP value indicates a negative contribution to the output. (B) Graphs of protein importance based on the mean SHAP values of the LightGBM model in predicting myometrial invasion. The graph shows an order of importance for proteins according to their collective absolute SHAP values.
FIGURE 3
FIGURE 3
(A) Global SHAP annotations for Grade 1 vs. High Grade prediction of the XGBoost model. Horizontal positions reflect the effect of proteins on model output. Colors indicate whether protein is high (red) or low (blue) for a particular patient. A positive SHAP value indicates a positive contribution to the output, and a negative SHAP value indicates a negative contribution to the output. (B) Graphs of protein importance based on the mean SHAP values of the XGBoost model in predicting Tumor Grade 1 vs. High Grade. The graph shows an order of importance for proteins according to their collective absolute SHAP values.
FIGURE 4
FIGURE 4
(A) Global SHAP annotations for postmenopausal and premenopausal prediction of the XGBoost model. Horizontal positions reflect the effect of proteins on model output. Colors indicate whether protein is high (red) or low (blue) for a particular patient. A positive SHAP value indicates a positive contribution to the output, and a negative SHAP value indicates a negative contribution to the output. (B) Graphs of protein importance based on the mean SHAP values of the XGBoost model in predicting postmenopausal vs. premenopausal. The graph shows an order of importance for proteins according to their collective absolute SHAP values.

Similar articles

Cited by

References

    1. Aerqin Q., Wang Z.-T., Wu K.-M., He X.-Y., Dong Q., Yu J.-T. (2022). Omics-based biomarkers discovery for Alzheimer's disease. Cell. Mol. Life Sci. 79 (12), 585. 10.1007/s00018-022-04614-6 - DOI - PMC - PubMed
    1. Aksoy S., Özavşar M., Altındal A. (2022). Classification of VOC vapors using machine learning algorithms. J. Eng. Technol. Appl. Sci. 7 (2), 97–107. 10.30931/jetas.1030981 - DOI
    1. Arrieta A. B., Díaz-Rodríguez N., Del Ser J., Bennetot A., Tabik S., Barbado A., et al. (2020). Explainable Artificial Intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. fusion 58, 82–115. 10.1016/j.inffus.2019.12.012 - DOI
    1. Banno K., Kisu I., Yanokura M., Tsuji K., Masuda K., Ueki A., et al. (2012). Biomarkers in endometrial cancer: possible clinical applications (Review). Oncol. Lett. 3 (6), 1175–1180. 10.3892/ol.2012.654 - DOI - PMC - PubMed
    1. Bray F., Ferlay J., Soerjomataram I., Siegel R. L., Torre L. A., Jemal A. (2018). Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA A Cancer J. Clin. 68 (6), 394–424. 10.3322/caac.21492 - DOI - PubMed

LinkOut - more resources