Impact analysis of keyword extraction using contextual word embedding

Muhammad Qasim Khan¹, Abdul Shahid¹, M Irfan Uddin¹, Muhammad Roman¹, Abdullah Alharbi², Wael Alosaimi², Jameel Almalki³, Saeed M Alshahrani⁴

Affiliations

¹ Institute of Computing, Kohat University of Science & Technology, Kohat, Kohat, Pakistan.
² Department of Information Technology, College of Computers and Information Technology, Taif University, Taif, Saudi Arabia.
³ Department of Computer Science, College of Computer in Al-Leith, Umm Al-Qura University, Makkah, Saudi Arabia.
⁴ College of Computing and Information Technology, Shaqra University, Shaqra, Saudi Arabia.

PMID: 35721401
PMCID: PMC9202614
DOI: 10.7717/peerj-cs.967

Impact analysis of keyword extraction using contextual word embedding

Muhammad Qasim Khan et al. PeerJ Comput Sci. 2022.

. 2022 May 30:8:e967.

doi: 10.7717/peerj-cs.967. eCollection 2022.

Authors

Muhammad Qasim Khan¹, Abdul Shahid¹, M Irfan Uddin¹, Muhammad Roman¹, Abdullah Alharbi², Wael Alosaimi², Jameel Almalki³, Saeed M Alshahrani⁴

Affiliations

¹ Institute of Computing, Kohat University of Science & Technology, Kohat, Kohat, Pakistan.
² Department of Information Technology, College of Computers and Information Technology, Taif University, Taif, Saudi Arabia.
³ Department of Computer Science, College of Computer in Al-Leith, Umm Al-Qura University, Makkah, Saudi Arabia.
⁴ College of Computing and Information Technology, Shaqra University, Shaqra, Saudi Arabia.

PMID: 35721401
PMCID: PMC9202614
DOI: 10.7717/peerj-cs.967

Abstract

A document's keywords provide high-level descriptions of the content that summarize the document's central themes, concepts, ideas, or arguments. These descriptive phrases make it easier for algorithms to find relevant information quickly and efficiently. It plays a vital role in document processing, such as indexing, classification, clustering, and summarization. Traditional keyword extraction approaches rely on statistical distributions of key terms in a document for the most part. According to contemporary technological breakthroughs, contextual information is critical in deciding the semantics of the work at hand. Similarly, context-based features may be beneficial in the job of keyword extraction. For example, simply indicating the previous or next word of the phrase of interest might be used to describe the context of a phrase. This research presents several experiments to validate that context-based key extraction is significant compared to traditional methods. Additionally, the KeyBERT proposed methodology also results in improved results. The proposed work relies on identifying a group of important words or phrases from the document's content that can reflect the authors' main ideas, concepts, or arguments. It also uses contextual word embedding to extract keywords. Finally, the findings are compared to those obtained using older approaches such as Text Rank, Rake, Gensim, Yake, and TF-IDF. The Journals of Universal Computer (JUCS) dataset was employed in our research. Only data from abstracts were used to produce keywords for the research article, and the KeyBERT model outperformed traditional approaches in producing similar keywords to the authors' provided keywords. The average similarity of our approach with author-assigned keywords is 51%.

Keywords: Contextual Word Embedding; Keyword extraction; TF-IDF; Text Rank; Yake.

PubMed Disclaimer

Conflict of interest statement

The authors declare there are no competing interests.

Figures

**Figure 1. Flow diagram of the proposed approach.**

**Figure 2. Contextual word embedding by BERT.**

**Figure 3. Comparison of each document similarity with author-assigned keywords in different approaches.**

**Figure 4. Keyword extraction analysis with different approaches.**

**Figure 5. Comparison of each document similarity with author-assigned keywords in different approaches using Wordnet synonyms.**

**Figure 6. Keyword extraction analysis with different approaches using wordnet synonym.**

**Figure 7. Comparison of each document similarity with Author assigned keywords in different approaches using Wordnet synonyms.**

**Figure 8. Keyword extraction analysis with different approaches using Wordnet synonym.**

See this image and copyright information in PMC

References

1. Aljuaid H, Iftikhar R, Ahmad S, Asif M, Afzal MT. Important citation identification using sentiment analysis of in-text citations. Telematics and Informatics. 2021;56:101492. doi: 10.1016/j.tele.2020.101492. - DOI
1. Alzaidy R, Caragea C, Lee Giles C. Bi-LSTM-CRF sequence labeling for keyphrase extraction from scholarly documents. The world wide web conference; 2019. pp. 2551–2557.
1. Basaldella M, Antolli E, Serra G, Tasso C. Bidirectional lstm recurrent neural network for keyphrase extraction. Italian research conference on digital libraries; Cham. 2018. pp. 180–187.
1. Bennani-Smires K, Musat C, Hossmann A, Baeriswyl M, Jaggi M. Simple unsupervised keyphrase extraction using sentence embeddings. 2018. 1801.04470
1. Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. Journal of Machine Learning Research. 2003;3(Jan):993–1022.

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central
Other Literature Sources
- The Lens - Patent Citations Database
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Impact analysis of keyword extraction using contextual word embedding

Affiliations

Impact analysis of keyword extraction using contextual word embedding

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous