. 2020 Feb 11;21(1):53.

doi: 10.1186/s12859-020-3393-1.

DTranNER: biomedical named entity recognition with deep learning-based label-label transition model

S K Hong¹, Jae-Gil Lee^{2

3}

Affiliations

¹ Graduate School of Knowledge Service Engineering, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, South Korea.
² Graduate School of Knowledge Service Engineering, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, South Korea. jaegil@kaist.ac.kr.
³ Department of Industrial & Systems Engineering, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, South Korea. jaegil@kaist.ac.kr.

PMID: 32046638
PMCID: PMC7014657
DOI: 10.1186/s12859-020-3393-1

DTranNER: biomedical named entity recognition with deep learning-based label-label transition model

S K Hong et al. BMC Bioinformatics. 2020.

. 2020 Feb 11;21(1):53.

doi: 10.1186/s12859-020-3393-1.

Authors

S K Hong¹, Jae-Gil Lee^{2

3}

Affiliations

¹ Graduate School of Knowledge Service Engineering, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, South Korea.
² Graduate School of Knowledge Service Engineering, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, South Korea. jaegil@kaist.ac.kr.
³ Department of Industrial & Systems Engineering, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, South Korea. jaegil@kaist.ac.kr.

PMID: 32046638
PMCID: PMC7014657
DOI: 10.1186/s12859-020-3393-1

Abstract

Background: Biomedical named-entity recognition (BioNER) is widely modeled with conditional random fields (CRF) by regarding it as a sequence labeling problem. The CRF-based methods yield structured outputs of labels by imposing connectivity between the labels. Recent studies for BioNER have reported state-of-the-art performance by combining deep learning-based models (e.g., bidirectional Long Short-Term Memory) and CRF. The deep learning-based models in the CRF-based methods are dedicated to estimating individual labels, whereas the relationships between connected labels are described as static numbers; thereby, it is not allowed to timely reflect the context in generating the most plausible label-label transitions for a given input sentence. Regardless, correctly segmenting entity mentions in biomedical texts is challenging because the biomedical terms are often descriptive and long compared with general terms. Therefore, limiting the label-label transitions as static numbers is a bottleneck in the performance improvement of BioNER.

Results: We introduce DTranNER, a novel CRF-based framework incorporating a deep learning-based label-label transition model into BioNER. DTranNER uses two separate deep learning-based networks: Unary-Network and Pairwise-Network. The former is to model the input for determining individual labels, and the latter is to explore the context of the input for describing the label-label transitions. We performed experiments on five benchmark BioNER corpora. Compared with current state-of-the-art methods, DTranNER achieves the best F1-score of 84.56% beyond 84.40% on the BioCreative II gene mention (BC2GM) corpus, the best F1-score of 91.99% beyond 91.41% on the BioCreative IV chemical and drug (BC4CHEMD) corpus, the best F1-score of 94.16% beyond 93.44% on the chemical NER, the best F1-score of 87.22% beyond 86.56% on the disease NER of the BioCreative V chemical disease relation (BC5CDR) corpus, and a near-best F1-score of 88.62% on the NCBI-Disease corpus.

Conclusions: Our results indicate that the incorporation of the deep learning-based label-label transition model provides distinctive contextual clues to enhance BioNER over the static transition model. We demonstrate that the proposed framework enables the dynamic transition model to adaptively explore the contextual relations between adjacent labels in a fine-grained way. We expect that our study can be a stepping stone for further prosperity of biomedical literature mining.

Keywords: Bioinformatics; Data mining; Named entity recognition; Neural network.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

**Fig. 1**
The overall architectures of the proposed framework DTranNER. a As a CRF-based framework, DTranNER is comprised of two separate, underlying deep learning-based networks: Unary-Network and Pairwise-Network are arranged to yield agreed label sequences in the prediction stage. The underlying DL-based networks of DTranNER are trained via two separate CRFs: Unary-CRF and Pairwise-CRF. b The architecture of Unary-CRF. It is dedicated to train Unary-Network. c The architecture of Pairwise-CRF. It is also committed to train Pairwise-Network. A token embedding layer is shared by Unary-Network and Pairwise-Network. A token-embedding is built upon by concatenating its traditional word embedding (denoted as “W2V”) and its contextualized token embedding (denoted as “ELMo”)

See this image and copyright information in PMC

Cited by

Chinese Clinical Named Entity Recognition with ALBERT and MHA Mechanism.
Li D, Long J, Qu J, Zhang X. Li D, et al. Evid Based Complement Alternat Med. 2022 May 23;2022:2056039. doi: 10.1155/2022/2056039. eCollection 2022. Evid Based Complement Alternat Med. 2022. PMID: 35656458 Free PMC article.
Do LLMs Surpass Encoders for Biomedical NER?
Obeidat MS, Al Nahian MS, Kavuluru R. Obeidat MS, et al. Proc (IEEE Int Conf Healthc Inform). 2025 Jun;2025:352-358. doi: 10.1109/ICHI64645.2025.00048. Epub 2025 Jul 22. Proc (IEEE Int Conf Healthc Inform). 2025. PMID: 40787150 Free PMC article.
Parallel sequence tagging for concept recognition.
Furrer L, Cornelius J, Rinaldi F. Furrer L, et al. BMC Bioinformatics. 2022 Mar 24;22(Suppl 1):623. doi: 10.1186/s12859-021-04511-y. BMC Bioinformatics. 2022. PMID: 35331131 Free PMC article.
A BERT-based ensemble learning approach for the BioCreative VII challenges: full-text chemical identification and multi-label classification in PubMed articles.
Lin SJ, Yeh WC, Chiu YW, Chang YC, Hsu MH, Chen YS, Hsu WL. Lin SJ, et al. Database (Oxford). 2022 Jul 15;2022:baac056. doi: 10.1093/database/baac056. Database (Oxford). 2022. PMID: 35849027 Free PMC article.
A pre-training and self-training approach for biomedical named entity recognition.
Gao S, Kotevska O, Sorokine A, Christian JB. Gao S, et al. PLoS One. 2021 Feb 9;16(2):e0246310. doi: 10.1371/journal.pone.0246310. eCollection 2021. PLoS One. 2021. PMID: 33561139 Free PMC article.

See all "Cited by" articles

References

1. Gurulingappa H, Mateen-Rajpu A, Toldo L. Extraction of potential adverse drug events from medical case reports. J Biomed Semant. 2012;3(1):15. doi: 10.1186/2041-1480-3-15. - DOI - PMC - PubMed
1. Bossy Robert, Jourde Julien, Manine Alain-Pierre, Veber Philippe, Alphonse Erick, van de Guchte Maarten, Bessi.res Philippe, N.dellec Claire. BioNLP Shared Task - The Bacteria Track. BMC Bioinformatics. 2012;13(Suppl 11):S3. doi: 10.1186/1471-2105-13-S11-S3. - DOI - PMC - PubMed
1. Zhang W, Chen Y, Liu F, Luo F, Tian G, Li X. Predicting potential drug-drug interactions by integrating chemical, biological, phenotypic and network data. BMC Bioinformatics. 2017;18(1):18. doi: 10.1186/s12859-016-1415-9. - DOI - PMC - PubMed
1. Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J, Simonovic M, Roth A, Santos A, Tsafou KP, et al. STRING v10: protein–protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 2014;43(D1):447–52. doi: 10.1093/nar/gku1003. - DOI - PMC - PubMed
1. Lafferty J, McCallum A, Pereira FC. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning. ACM: 2001. p. 282–9. http://portal.acm.org/citation.cfm?id=655813.

MeSH terms

Actions
Actions
Actions

Grants and funding

2017R1E1A1A01075927/National Research Foundation of Korea

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

DTranNER: biomedical named entity recognition with deep learning-based label-label transition model

Affiliations

DTranNER: biomedical named entity recognition with deep learning-based label-label transition model

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources