. 2020 Dec 14;20(Suppl 4):314.

doi: 10.1186/s12911-020-01341-5.

KGen: a knowledge graph generator from biomedical scientific literature

Anderson Rossanez¹, Julio Cesar Dos Reis², Ricardo da Silva Torres³, Hélène de Ribaupierre⁴

Affiliations

¹ Institute of Computing, University of Campinas, Campinas, SP, Brazil. anderson.rossanez@ic.unicamp.br.
² Institute of Computing, University of Campinas, Campinas, SP, Brazil.
³ Department of ICT and Natural Sciences, Faculty of Information Technology and Electrical Engineering, NTNU - Norwegian University of Science and Technology, Ålesund, Norway.
⁴ School of Computer Science and Informatics, Cardiff University, Cardiff, UK.

PMID: 33317512
PMCID: PMC7734730
DOI: 10.1186/s12911-020-01341-5

KGen: a knowledge graph generator from biomedical scientific literature

Anderson Rossanez et al. BMC Med Inform Decis Mak. 2020.

. 2020 Dec 14;20(Suppl 4):314.

doi: 10.1186/s12911-020-01341-5.

Authors

Anderson Rossanez¹, Julio Cesar Dos Reis², Ricardo da Silva Torres³, Hélène de Ribaupierre⁴

Affiliations

¹ Institute of Computing, University of Campinas, Campinas, SP, Brazil. anderson.rossanez@ic.unicamp.br.
² Institute of Computing, University of Campinas, Campinas, SP, Brazil.
³ Department of ICT and Natural Sciences, Faculty of Information Technology and Electrical Engineering, NTNU - Norwegian University of Science and Technology, Ålesund, Norway.
⁴ School of Computer Science and Informatics, Cardiff University, Cardiff, UK.

PMID: 33317512
PMCID: PMC7734730
DOI: 10.1186/s12911-020-01341-5

Abstract

Background: Knowledge is often produced from data generated in scientific investigations. An ever-growing number of scientific studies in several domains result into a massive amount of data, from which obtaining new knowledge requires computational help. For example, Alzheimer's Disease, a life-threatening degenerative disease that is not yet curable. As the scientific community strives to better understand it and find a cure, great amounts of data have been generated, and new knowledge can be produced. A proper representation of such knowledge brings great benefits to researchers, to the scientific community, and consequently, to society.

Methods: In this article, we study and evaluate a semi-automatic method that generates knowledge graphs (KGs) from biomedical texts in the scientific literature. Our solution explores natural language processing techniques with the aim of extracting and representing scientific literature knowledge encoded in KGs. Our method links entities and relations represented in KGs to concepts from existing biomedical ontologies available on the Web. We demonstrate the effectiveness of our method by generating KGs from unstructured texts obtained from a set of abstracts taken from scientific papers on the Alzheimer's Disease. We involve physicians to compare our extracted triples from their manual extraction via their analysis of the abstracts. The evaluation further concerned a qualitative analysis by the physicians of the generated KGs with our software tool.

Results: The experimental results indicate the quality of the generated KGs. The proposed method extracts a great amount of triples, showing the effectiveness of our rule-based method employed in the identification of relations in texts. In addition, ontology links are successfully obtained, which demonstrates the effectiveness of the ontology linking method proposed in this investigation.

Conclusions: We demonstrate that our proposal is effective on building ontology-linked KGs representing the knowledge obtained from biomedical scientific texts. Such representation can add value to the research in various domains, enabling researchers to compare the occurrence of concepts from different studies. The KGs generated may pave the way to potential proposal of new theories based on data analysis to advance the state of the art in their research domains.

Keywords: Information Extraction; Knowledge Graphs; Ontologies; RDF Triples.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

**Fig. 1**
KGen (knowledge graph generation) pipeline. The unstructured text (input) goes through four key steps. An ontology-linked knowledge graph is generated at the end

**Fig. 2**
The first key step: preprocessing. The unstructured text (input) goes through four sub-steps. A preprocessed text is generated as output

**Fig. 3**
A parse tree. Tokens are the seen at bottom (leaves), with their corresponding parts of speech right above. The root level denotes the sentence, and the intermediary levels denote the phrases

**Fig. 4**
Preprocessing step’s input and output

**Fig. 5**
The second key step: triples extraction. The preprocessed text (input) goes through two sub-steps, generating a set of triples as output

**Fig. 6**
Algorithm for extracting the main triples

**Fig. 7**
Dependency parsing output. At the bottom are the sentence tokens, with their corresponding parts of speech on top. The arrows show the labeled dependencies between the tokens

**Fig. 8**
Algorithm for extracting the secondary triples

**Fig. 9**
Triples extraction step’s input and output

**Fig. 10**
The third key step: ontology linking. The preprocessed text (input) goes through three sub-steps. A set of ontology links are generated as output

**Fig. 11**
SPARQL query example for mapping UMLS CUIs to the final ontology

**Fig. 12**
Algorithm for ontology linking

**Fig. 13**
Ontology linking step’s output

**Fig. 14**
The final key step: graph generation. The sets of triples and links (inputs) go through two sub-steps before generating an ontology-linked knowledge graph as output

**Fig. 15**
Graphical representation. Ontology-linked knowledge graph generated from the following sentence: *This study confirms the high prevalence of poststroke cognitive impairment in diverse populations.*

**Fig. 16**
Implemented tool architecture. The four key KGen steps are implemented in four components, seen at the central portion. In the lower portion there are 3rd party components. In the upper portion, there are wrappers for external services

**Fig. 17**
Reduced knowledge graph example. Knowledge graph generated for the triples extracted from the following sentence: *This study highlights common risk factors, in particular diabetes mellitus.*

See this image and copyright information in PMC

References

1. Ehrlinger L, Wöß W. Towards a definition of knowledge graphs. In: 12th International conference on semantic systems (SEMANTiCS2016) 2016.
1. Candan KS, Liu H, Suvarna R. Resource description framework: metadata and its applications. SIGKDD Explor Newsl. 2001;3(1):6–19. doi: 10.1145/507533.507536. - DOI
1. Bizer C. The emerging web of linked data. IEEE Intell Syst. 2009;24(5):87–92. doi: 10.1109/MIS.2009.102. - DOI
1. Regino AG, Matsoui JKR, Dos Reis JC, Bonacin R, Morshed A, Sellis T. Understanding link changes in lod via the evolution of life science datasets. In: Proceedings of the workshop on semantic web solutions for large-scale biomedical data analytics. SeWeBMeDA 2019, 2019;40–54.
1. Belleau F, Nolin M-A, Tourigny N, Rigault P, Morissette J. Bio2rdf: Towards a mashup to build bioinformatics knowledge systems. J Biomed Inform. 2008;41(5):706–716. doi: 10.1016/j.jbi.2008.03.004. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

2017/02325-5/Fundação de Amparo à Pesquisa do Estado de São Paulo

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

KGen: a knowledge graph generator from biomedical scientific literature

Affiliations

KGen: a knowledge graph generator from biomedical scientific literature

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials