. 2025 Jun 14:27:2626-2637.

doi: 10.1016/j.csbj.2025.06.025. eCollection 2025.

Darling (v2.0): Mining disease-related databases for the detection of biomedical entity associations

Fotis A Baltoumas¹, Evangelos Karatzas¹, Nefeli K Venetsianou¹, Eleni Aplakidou^{1

2}, Konstantinos Giatras¹, Maria N Chasapi¹, Iro N Chasapi¹, Ioannis Iliopoulos², Vassiliki A Iconomidou³, Ioannis P Trougakos³, Fotis Psomopoulos⁴, Antonis Giannakakis^{5

6}, Ilias Georgakopoulos-Soares⁷, Panagiota Kontou⁸, Pantelis G Bagos⁹, Georgios A Pavlopoulos¹

Affiliations

¹ Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Athens, Greece.
² Department of Basic Sciences, School of Medicine, University of Crete, Heraklion 71003, Greece.
³ Section of Cell Biology and Biophysics, Department of Biology, National and Kapodistrian University of Athens, Panepistimiopolis, Athens 15784, Greece.
⁴ Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece.
⁵ Department of Molecular Biology and Genetics, Democritus University of Thrace, Alexandroupolis, Greece.
⁶ University Research Institute of Maternal and Child Health and Precision Medicine, National and Kapodistrian University of Athens, Athens 11527, Greece.
⁷ Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, Pennsylvania State University College of Medicine, Hershey, PA, USA.
⁸ Department of Mathematics, University of Thessaly, Lamia 35131, Greece.
⁹ Department of Computer Science and Biomedical Informatics, University of Thessaly, Lamia 35131, Greece.

PMID: 40599243
PMCID: PMC12212154
DOI: 10.1016/j.csbj.2025.06.025

Darling (v2.0): Mining disease-related databases for the detection of biomedical entity associations

Fotis A Baltoumas et al. Comput Struct Biotechnol J. 2025.

. 2025 Jun 14:27:2626-2637.

doi: 10.1016/j.csbj.2025.06.025. eCollection 2025.

Authors

Affiliations

¹ Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Athens, Greece.
² Department of Basic Sciences, School of Medicine, University of Crete, Heraklion 71003, Greece.
³ Section of Cell Biology and Biophysics, Department of Biology, National and Kapodistrian University of Athens, Panepistimiopolis, Athens 15784, Greece.
⁴ Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece.
⁵ Department of Molecular Biology and Genetics, Democritus University of Thrace, Alexandroupolis, Greece.
⁶ University Research Institute of Maternal and Child Health and Precision Medicine, National and Kapodistrian University of Athens, Athens 11527, Greece.
⁷ Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, Pennsylvania State University College of Medicine, Hershey, PA, USA.
⁸ Department of Mathematics, University of Thessaly, Lamia 35131, Greece.
⁹ Department of Computer Science and Biomedical Informatics, University of Thessaly, Lamia 35131, Greece.

PMID: 40599243
PMCID: PMC12212154
DOI: 10.1016/j.csbj.2025.06.025

Abstract

Darling is a web application that employs literature mining to detect disease-related biomedical entity associations. Darling can detect sentence-based cooccurrences of biomedical entities such as genes, proteins, chemicals, functions, tissues, diseases, environments, and phenotypes from biomedical literature found in six disease-centric databases. In this version, we deploy additional query channels focusing on COVID-19, GWAS studies, cardiovascular, neurodegenerative, and cancer diseases. Compared to its predecessor, users now have extended query options including searches with PubMed identifiers, disease records, entity names, titles, single nucleotide polymorphisms, or the Entrez syntax. Furthermore, after applying named entity recognition, one can retrieve and mine the relevant literature from recognized terms for a free input text. Term associations are captured in customizable networks which can be further filtered by either term or co-occurrence frequency and visualized in 2D as weighted graphs or in 3D as multi-layered networks. The fetched terms are organized in searchable tables and clustered annotated documents. The reported genes can be further analyzed for functional enrichment using external applications called from within Darling. The Darling databases, including terms and their associations, are updated annually. Darling is available at: https://www.darling-miner.org/.

Keywords: Co-occurrence analysis; Literature mining; Named entity recognition; Network analysis; Text mining.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

**Fig. 1**
Darling’s general scheme. (A) Database content update workflow. Disease-related literature mining, named entity recognition (NER), and co-occurrence analysis. (B) Various disease-specific search interfaces offered by DARLING v2.0: COVID-19 (LitCovid), Disease Ontology groups (cancer, cardiovascular, and nervous system diseases), GWAS-based queries, and SNP search.

**Fig. 2**
Darling 2.0 UI components for the various channels and their corresponding search options. The central navigation panel (middle) leads to ten tailored query forms. The Disease Search (top left) lets users enter disease IDs or names and choose among OMIM, DisGeNET, HPO, MONDO, RNADisease, or GWAS Catalog. The COVID-19, Cancer, Cardiovascular, and Nervous System tabs (left and right) each support PubMed lookups by PMID list or keywords, with context-specific filters (e.g. transmission, cell type, organ, disease subtypes). The GWAS Search form (center left) accepts study IDs/names, trait IDs/names, genes, and SNPs, combined via AND/OR. Bioentity Search (bottom right) enables identifier or name queries across Chemicals, Genes/Proteins, and Tissues. SNP Search (right middle) ingests rs-ID lists. Literature Search (bottom right) handles PubMed-formatted or title-text queries, while Free Text Search (bottom left) applies named-entity recognition to arbitrary text, highlighting Genes/Proteins, Chemicals, and Tissues in the annotated output.

**Fig. 3**
(A) Example of document annotation and clustering. (B) 2D and 3D network visualizations of co-occurring terms, such as genes/proteins, chemical compounds, and diseases. In the 2D visualization, each bioentity type is assigned a distinct color, whereas in the 3D network, each type is separated into a different layer.

**Fig. 4**
Integrated entity frequency and co-occurrence analyses reveal key molecular, tissue, and phenotypic associations in Alzheimer’s–diabetes and colorectal cancer. (A) Bar chart of the top 30 entities uncovered by Darling’s “*Nervous System Diseases*” channel when querying “*Alzheimer Disease*” and filtering for diabetes-related terms. Entities comprise genes/proteins (orange), GO biological processes (gray), tissues (turquoise), and mammalian phenotypes (mauve), ranked by the number of supporting articles. (B) Co-occurrence network of the filtered AD–T2DM dataset, color-coded by entity class: tissues (turquoise), genes/proteins (orange), mammalian phenotypes (mauve), and GO biological processes (gray). A magnified inset highlights key subnetworks: intersections around CDK5 (green circle), the “*insulin receptor signaling pathway”* linking neurodegeneration to metabolic signaling (black circle), and a glutamate-related cluster involving CLU, PCALM, and BIN1 in the hippocampus (red circle). (C) Word cloud of the most frequently mentioned genes and proteins in the 5000 most recent abstracts retrieved from Darling’s “*Cancer*” channel using the term “*colorectal cancer*”. Word size reflects article frequency, with KRAS, HRAS, and EGFR most prominent. (D) Multi-layer co-occurrence network rendered in Arena3Dweb for colorectal cancer, showing GO molecular functions, genes/proteins, BTO tissues, and mammalian phenotypes on separate layers. Edges represent cross-layer associations above a frequency threshold of 50. A zoomed inset emphasizes the central oncogenic cluster (KRAS, HRAS, EGFR, BRAF, SNRPE, and ERBB2) linked to colorectal cancer cell line tissues.

See this image and copyright information in PMC

References

1. Chen Q., Allot A., Leaman R., Wei C.-H., Aghaarabi E., Guerrerio J.J., et al. LitCovid in 2022: an information resource for the COVID-19 literature. Nucleic Acids Res. 2023;51:D1512–D1518. doi: 10.1093/nar/gkac1005. - DOI - PMC - PubMed
1. Cheerkoot-Jalim S., Khedo K.K. A systematic review of text mining approaches applied to various application areas in the biomedical domain. JKM. 2021;25:642–668. doi: 10.1108/JKM-09-2019-0524. - DOI
1. Przybyła P., Shardlow M., Aubin S., Bossy R., Eckart de Castilho R., Piperidis S., et al. Text mining resources for the life sciences. Database. 2016;2016 doi: 10.1093/database/baw145. - DOI - PMC - PubMed
1. Rebholz-Schuhmann D., Oellrich A., Hoehndorf R. Text-mining solutions for biomedical research: enabling integrative biology. Nat Rev Genet. 2012;13:829–839. doi: 10.1038/nrg3337. - DOI - PubMed
1. Wang L.L., Lo K. Text mining approaches for dealing with the rapidly expanding literature on COVID-19. Brief Bioinform. 2021;22:781–799. doi: 10.1093/bib/bbaa296. - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources
- Elsevier Science
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Darling (v2.0): Mining disease-related databases for the detection of biomedical entity associations

Affiliations

Darling (v2.0): Mining disease-related databases for the detection of biomedical entity associations

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

References

Related information

LinkOut - more resources

Full Text Sources