A Natural Language Processing Method Identifies an Association Between Bacterial Communities in the Upper Genital Tract and Ovarian Cancer
- PMID: 40806562
- PMCID: PMC12347966
- DOI: 10.3390/ijms26157432
A Natural Language Processing Method Identifies an Association Between Bacterial Communities in the Upper Genital Tract and Ovarian Cancer
Abstract
Bacterial communities within the female upper genital tract may influence the risk of ovarian cancer. In this retrospective cohort pilot study, we aim to detect different communities of bacteria between ovarian cancer and normal controls using topic modeling, a natural language processing tool. RNA was extracted and analyzed using the VITCOMIC2 pipeline. Topic modeling assessed differences in bacterial communities. Idatuning identified an optimal latent topic number and Latent Dirichlet Allocation (LDA) assessed topic differences between high-grade serous ovarian cancer (HGSOC) and controls. Results were validated using The Cancer Genome Atlas (TCGA) HGSOC dataset. A total of 801 unique taxa were identified, with 13 bacteria significantly differing between HGSOC and normal controls. LDA modeling revealed a latent topic associated with HGSOC samples, containing bacteria Escherichia/Shigella and Corynebacterineae. Pathway analysis using KEGG databases suggest differences in several biologic pathways including oocyte meiosis, aldosterone-regulated sodium reabsorption, gastric acid secretion, and long-term potentiation. These findings support the hypothesis that bacterial communities in the upper female genital tract may influence the development of HGSOC by altering the local environment, with potential functional implications between HGSOC and normal controls. However, further validation is required to confirms these associations and determine mechanistic relevance.
Keywords: RNA sequencing; RNAseq; microbiome; natural language processing; ovarian cancer; prediction model.
Conflict of interest statement
All the authors have nothing to disclose. This does not alter our adherence to the journal policies on sharing data and materials.
Figures







Similar articles
-
Distinct Microbial Signatures along the Female Reproductive Tract in Endometrial Cancer Patients.J Microbiol Biotechnol. 2025 Aug 26;35:e2503048. doi: 10.4014/jmb.2503.03048. J Microbiol Biotechnol. 2025. PMID: 40877024
-
A rapid and systematic review of the clinical effectiveness and cost-effectiveness of topotecan for ovarian cancer.Health Technol Assess. 2001;5(28):1-110. doi: 10.3310/hta5280. Health Technol Assess. 2001. PMID: 11701100
-
Stage of pregnancy impacts the bacterial communities of reproductive and placental tissues in gilts.J Anim Sci. 2025 Jan 4;103:skaf159. doi: 10.1093/jas/skaf159. J Anim Sci. 2025. PMID: 40336167
-
Prescription of Controlled Substances: Benefits and Risks.2025 Jul 6. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2025 Jan–. 2025 Jul 6. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2025 Jan–. PMID: 30726003 Free Books & Documents.
-
Taxane monotherapy regimens for the treatment of recurrent epithelial ovarian cancer.Cochrane Database Syst Rev. 2022 Jul 12;7(7):CD008766. doi: 10.1002/14651858.CD008766.pub3. Cochrane Database Syst Rev. 2022. PMID: 35866378 Free PMC article.
References
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Medical