Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jun 14:27:2626-2637.
doi: 10.1016/j.csbj.2025.06.025. eCollection 2025.

Darling (v2.0): Mining disease-related databases for the detection of biomedical entity associations

Affiliations

Darling (v2.0): Mining disease-related databases for the detection of biomedical entity associations

Fotis A Baltoumas et al. Comput Struct Biotechnol J. .

Abstract

Darling is a web application that employs literature mining to detect disease-related biomedical entity associations. Darling can detect sentence-based cooccurrences of biomedical entities such as genes, proteins, chemicals, functions, tissues, diseases, environments, and phenotypes from biomedical literature found in six disease-centric databases. In this version, we deploy additional query channels focusing on COVID-19, GWAS studies, cardiovascular, neurodegenerative, and cancer diseases. Compared to its predecessor, users now have extended query options including searches with PubMed identifiers, disease records, entity names, titles, single nucleotide polymorphisms, or the Entrez syntax. Furthermore, after applying named entity recognition, one can retrieve and mine the relevant literature from recognized terms for a free input text. Term associations are captured in customizable networks which can be further filtered by either term or co-occurrence frequency and visualized in 2D as weighted graphs or in 3D as multi-layered networks. The fetched terms are organized in searchable tables and clustered annotated documents. The reported genes can be further analyzed for functional enrichment using external applications called from within Darling. The Darling databases, including terms and their associations, are updated annually. Darling is available at: https://www.darling-miner.org/.

Keywords: Co-occurrence analysis; Literature mining; Named entity recognition; Network analysis; Text mining.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

Fig. 1
Fig. 1
Darling’s general scheme. (A) Database content update workflow. Disease-related literature mining, named entity recognition (NER), and co-occurrence analysis. (B) Various disease-specific search interfaces offered by DARLING v2.0: COVID-19 (LitCovid), Disease Ontology groups (cancer, cardiovascular, and nervous system diseases), GWAS-based queries, and SNP search.
Fig. 2
Fig. 2
Darling 2.0 UI components for the various channels and their corresponding search options. The central navigation panel (middle) leads to ten tailored query forms. The Disease Search (top left) lets users enter disease IDs or names and choose among OMIM, DisGeNET, HPO, MONDO, RNADisease, or GWAS Catalog. The COVID-19, Cancer, Cardiovascular, and Nervous System tabs (left and right) each support PubMed lookups by PMID list or keywords, with context-specific filters (e.g. transmission, cell type, organ, disease subtypes). The GWAS Search form (center left) accepts study IDs/names, trait IDs/names, genes, and SNPs, combined via AND/OR. Bioentity Search (bottom right) enables identifier or name queries across Chemicals, Genes/Proteins, and Tissues. SNP Search (right middle) ingests rs-ID lists. Literature Search (bottom right) handles PubMed-formatted or title-text queries, while Free Text Search (bottom left) applies named-entity recognition to arbitrary text, highlighting Genes/Proteins, Chemicals, and Tissues in the annotated output.
Fig. 3
Fig. 3
(A) Example of document annotation and clustering. (B) 2D and 3D network visualizations of co-occurring terms, such as genes/proteins, chemical compounds, and diseases. In the 2D visualization, each bioentity type is assigned a distinct color, whereas in the 3D network, each type is separated into a different layer.
Fig. 4
Fig. 4
Integrated entity frequency and co-occurrence analyses reveal key molecular, tissue, and phenotypic associations in Alzheimer’s–diabetes and colorectal cancer. (A) Bar chart of the top 30 entities uncovered by Darling’s “Nervous System Diseases” channel when querying “Alzheimer Disease” and filtering for diabetes-related terms. Entities comprise genes/proteins (orange), GO biological processes (gray), tissues (turquoise), and mammalian phenotypes (mauve), ranked by the number of supporting articles. (B) Co-occurrence network of the filtered AD–T2DM dataset, color-coded by entity class: tissues (turquoise), genes/proteins (orange), mammalian phenotypes (mauve), and GO biological processes (gray). A magnified inset highlights key subnetworks: intersections around CDK5 (green circle), the “insulin receptor signaling pathway” linking neurodegeneration to metabolic signaling (black circle), and a glutamate-related cluster involving CLU, PCALM, and BIN1 in the hippocampus (red circle). (C) Word cloud of the most frequently mentioned genes and proteins in the 5000 most recent abstracts retrieved from Darling’s “Cancer” channel using the term “colorectal cancer”. Word size reflects article frequency, with KRAS, HRAS, and EGFR most prominent. (D) Multi-layer co-occurrence network rendered in Arena3Dweb for colorectal cancer, showing GO molecular functions, genes/proteins, BTO tissues, and mammalian phenotypes on separate layers. Edges represent cross-layer associations above a frequency threshold of 50. A zoomed inset emphasizes the central oncogenic cluster (KRAS, HRAS, EGFR, BRAF, SNRPE, and ERBB2) linked to colorectal cancer cell line tissues.

Similar articles

References

    1. Chen Q., Allot A., Leaman R., Wei C.-H., Aghaarabi E., Guerrerio J.J., et al. LitCovid in 2022: an information resource for the COVID-19 literature. Nucleic Acids Res. 2023;51:D1512–D1518. doi: 10.1093/nar/gkac1005. - DOI - PMC - PubMed
    1. Cheerkoot-Jalim S., Khedo K.K. A systematic review of text mining approaches applied to various application areas in the biomedical domain. JKM. 2021;25:642–668. doi: 10.1108/JKM-09-2019-0524. - DOI
    1. Przybyła P., Shardlow M., Aubin S., Bossy R., Eckart de Castilho R., Piperidis S., et al. Text mining resources for the life sciences. Database. 2016;2016 doi: 10.1093/database/baw145. - DOI - PMC - PubMed
    1. Rebholz-Schuhmann D., Oellrich A., Hoehndorf R. Text-mining solutions for biomedical research: enabling integrative biology. Nat Rev Genet. 2012;13:829–839. doi: 10.1038/nrg3337. - DOI - PubMed
    1. Wang L.L., Lo K. Text mining approaches for dealing with the rapidly expanding literature on COVID-19. Brief Bioinform. 2021;22:781–799. doi: 10.1093/bib/bbaa296. - DOI - PMC - PubMed

LinkOut - more resources