Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 9;25(1):143.
doi: 10.1007/s10238-025-01684-1.

Binary classification of gynecological cancers based on ATR-FTIR spectroscopy and machine learning using urine samples

Affiliations

Binary classification of gynecological cancers based on ATR-FTIR spectroscopy and machine learning using urine samples

Francesco Vigo et al. Clin Exp Med. .

Abstract

Making an early diagnosis of cancer still in the early stages, when completely asymptomatic, is the challenge modern medicine has been setting for several decades. In gynecology, no effective screening has yet been found and approved for endometrial and ovarian cancer. Mammography is an effective screening method for Breast Cancer, as well as Pap Test for Cervical Cancer, but they are underused in third world countries because of their expensive and specific instrumentation. Previous studies showed how "machine learning analysis methods" of the spectral information obtained from dried urine samples could provide good accuracy in differentiation between healthy and ovarian or endometrial cancer. In this study, we also apply ATR-FTIR spectrometry's practical, fast, and relatively inexpensive principles to liquid urine analysis from 309 patients undergoing surgical treatment for benign or malignant diseases (endometrium, breast, cervix, vulvar and ovarian cancer). The data obtained from those liquid samples were then analyzed to train a machine learning model to classify healthy VS cancer patients. We obtained an accuracy of > 91%, and we also identified discriminant wavelengths (2093, 1774 cm-1). These frequencies are close to already reported ones in other studies, indicating a possible association with tumor presence and/or progression.

Keywords: ATR-FTIR spectroscopy; Gynecological cancers; Machine learning; Urine biomarkers.

PubMed Disclaimer

Conflict of interest statement

Declarations. Conflict of interest: The authors declare no competing interests. Ethical approval: This study was approved by the Ethical Committee of the University Hospital of Basel (Approval Number: ID 2022-00109). All procedures performed were in accordance with the ethical standards of the institutional and national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards. Consent to participate: Informed consent was obtained from all individual participants included in the study. Consent to publish: Not applicable.

Figures

Fig. 1
Fig. 1
ROC curve indicates the accuracy of prediction of the PLS regression model using OPLS preprocessed data or not. The OPLS using 20 components appears to increase the metric scores of the predictions (cancer and benign gynecologic conditions)
Fig. 2
Fig. 2
Confusion Matrix, the table indicates the method performance in the test dataset. Here precision, Recall, and F1-score are shown. In the table, the values taken into account data unbalanced between tumor vs normal samples
Fig. 3
Fig. 3
Visualization of PLS Scores Derived from OPLS Transformed Spectral Data for Cancer Diagnosis. This figure displays the first two PLS scores resulting from the application of PLS regression to the spectral data, which was preprocessed using OPLS with 20 orthogonal components to enhance feature selection. The scores illustrate the effective separation between cancerous (represented in yellow) and benign gynecologic conditions (represented in green) samples, highlighting the discriminative power of the selected features in distinguishing between the two classes
Fig. 4
Fig. 4
Top 20 SHAP features importance. The important features are marked in red and go to the right part of the plot directionality
Fig. 5
Fig. 5
Top 3 frequencies as SHAP features importance in the developed Model. On the x axes, the average impact on RF model output magnitude is reported. This effectively indicated which one of the frequencies are prominently guiding the classification model
Fig. 6
Fig. 6
Binarized patients disease status in control group (N = 206) and cancer (N = 103). The scatter plots and the violin plots indicate the overall spread of the absorbance detected in the various samples. The plots indicate high significance (based on Kruskal–Wallis one-way analysis of variance with an alpha of 0·05) between the cancer samples and the normal samples

Similar articles

References

    1. Lortet-Tieulent J, Ferlay J, Bray F, Jemal A. International patterns and trends in endometrial cancer incidence, 1978–2013. J Natl Cancer Inst. 2018;110(4):354–61. 10.1093/jnci/djx214. - PubMed
    1. Cancer Research UK. Office for National Statistics: Cancer survival by stage at diagnosis for England. 2019.
    1. Reid BM, Permuth JB, Sellers TA. Cancer Biol Med. 2017;14(1):9–32. 10.20892/j.issn.2095-3941.2016.0084. - PMC - PubMed
    1. Cohen JD, Li L, Wang Y, Thoburn C, Afsari B, Danilova L, Douville C, et al. Detection and localization of surgically resectable cancers with a multi-analyte blood test. Science. 2018;359(6378):926–30. 10.1126/science.aar3247. - PMC - PubMed
    1. Lheureux S, Gourley C, Vergote I, Oza AM. Epithelial ovarian cancer. Lancet. 2019;393(10177):1240–53. 10.1016/S0140-6736(18)32552-2. - PubMed

LinkOut - more resources