A Syntactic Information-Based Classification Model for Medical Literature: Algorithm Development and Validation Study

Wentai Tang¹, Jian Wang¹, Hongfei Lin¹, Di Zhao¹, Bo Xu¹, Yijia Zhang¹, Zhihao Yang¹

Affiliations

PMID: 35917162
PMCID: PMC9382554
DOI: 10.2196/37817

A Syntactic Information-Based Classification Model for Medical Literature: Algorithm Development and Validation Study

Wentai Tang et al. JMIR Med Inform. 2022.

. 2022 Aug 2;10(8):e37817.

doi: 10.2196/37817.

Authors

Wentai Tang¹, Jian Wang¹, Hongfei Lin¹, Di Zhao¹, Bo Xu¹, Yijia Zhang¹, Zhihao Yang¹

Affiliation

¹ College of Computer Science and Technology, Dalian University of Technology, Dalian, China.

PMID: 35917162
PMCID: PMC9382554
DOI: 10.2196/37817

Abstract

Background: The ever-increasing volume of medical literature necessitates the classification of medical literature. Medical relation extraction is a typical method of classifying a large volume of medical literature. With the development of arithmetic power, medical relation extraction models have evolved from rule-based models to neural network models. The single neural network model discards the shallow syntactic information while discarding the traditional rules. Therefore, we propose a syntactic information-based classification model that complements and equalizes syntactic information to enhance the model.

Objective: We aim to complete a syntactic information-based relation extraction model for more efficient medical literature classification.

Methods: We devised 2 methods for enhancing syntactic information in the model. First, we introduced shallow syntactic information into the convolutional neural network to enhance nonlocal syntactic interactions. Second, we devise a cross-domain pruning method to equalize local and nonlocal syntactic interactions.

Results: We experimented with 3 data sets related to the classification of medical literature. The F1 values were 65.5% and 91.5% on the BioCreative ViCPR (CPR) and Phenotype-Gene Relationship data sets, respectively, and the accuracy was 88.7% on the PubMed data set. Our model outperforms the current state-of-the-art baseline model in the experiments.

Conclusions: Our model based on syntactic information effectively enhances medical relation extraction. Furthermore, the results of the experiments show that shallow syntactic information helps obtain nonlocal interaction in sentences and effectively reinforces syntactic features. It also provides new ideas for future research directions.

Keywords: classification; extraction; interaction; literature; medical literature; medical relation extraction; medical text; neural networks; pruning method; semantic; syntactic; syntactic features; text.

©Wentai Tang, Jian Wang, Hongfei Lin, Di Zhao, Bo Xu, Yijia Zhang, Zhihao Yang. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 02.08.2022.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

**Figure 1**
Interaction features by introducing shallow syntactic information and equalization. (A) Dependency tree without processing; (B) dependency tree after syntactic structure fusion; and (C) dependency tree after the pruning process. The weight of each arc in the forest is indicated by its number. Some edges were omitted for the sake of clarity.

**Figure 2**
Diagrammatic representation of the syntactic enhancement graph convolutional network model showing an instance and its syntactic information processing flow. The syntactic structure tree can be obtained from the encoder, and a matrix-tree can transform the syntactic dependency tree in the feature processor.

**Figure 3**
Performance against sentence length and Bidirectional Encoder Representations from Transformers (BERT) pretraining. (A) F1 scores at different sentence lengths. Results of the ForestFT– Dilated and Depthwise separable convolutional neural network are based on Jin et al [10]. (B) F1 scores against sentence length after BERT pretraining. AGGCN: attention-guided graph convolutional network; LFGCN: Lévy Flights graph convolutional network.

**Figure 4**
The heat maps of an example sentence in the syntactic enhancement graph convolutional network model.

See this image and copyright information in PMC

References

1. Heeman PA, Allen JF. Incorporating POS Tagging Into Language Modeling. Fifth European Conference on Speech Communication and Technology, EUROSPEECH; September 22-25, 1997; Rhodes. 1997. https://www.cs.rochester.edu/research/cisd/pubs/1997/paper1.pdf
1. Wright JH, Jones GJF, Lloyd-Thomas H. A robust language model incorporating a substring parser and extended n-grams. ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing; April 19-22, 1994; Adelaide, SA. 1994. - DOI
1. Merity S, Keskar NS, Socher R. Regularizing and optimizing LSTM language models. 6th International Conference on Learning Representations, ICLR 2018; April 30 - May 3, 2018; Vancouver, BC. 2018.
1. Peng N, Poon H, Quirk C, Toutanova K, Yih W. Cross-Sentence N-ary Relation Extraction with Graph LSTMs. TACL. 2017 Dec;5:101–115. doi: 10.1162/tacl_a_00049. - DOI
1. Linfeng S, Yue Z, Zhiguo W. N-ary Relation Extraction using Graph-State LSTM. 2018 Conference on Empirical Methods in Natural Language Processing; October 31, 2018; Brussels. 2018. - DOI

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A Syntactic Information-Based Classification Model for Medical Literature: Algorithm Development and Validation Study

Affiliation

A Syntactic Information-Based Classification Model for Medical Literature: Algorithm Development and Validation Study

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources