KaBOB: ontology-based semantic integration of biomedical databases

doi:10.1186/s12859-015-0559-3

. 2015 Apr 23;16(1):126.

doi: 10.1186/s12859-015-0559-3.

KaBOB: ontology-based semantic integration of biomedical databases

Kevin M Livingston¹, Michael Bada², William A Baumgartner Jr³, Lawrence E Hunter⁴

Affiliations

¹ Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, USA. Kevin.Livingston@ucdenver.edu.
² Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, USA. Mike.Bada@ucdenver.edu.
³ Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, USA. William.Baumgartner@ucdenver.edu.
⁴ Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, USA. Larry.Hunter@ucdenver.edu.

PMID: 25903923
PMCID: PMC4448321
DOI: 10.1186/s12859-015-0559-3

KaBOB: ontology-based semantic integration of biomedical databases

Kevin M Livingston et al. BMC Bioinformatics. 2015.

. 2015 Apr 23;16(1):126.

doi: 10.1186/s12859-015-0559-3.

Authors

Kevin M Livingston¹, Michael Bada², William A Baumgartner Jr³, Lawrence E Hunter⁴

Affiliations

¹ Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, USA. Kevin.Livingston@ucdenver.edu.
² Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, USA. Mike.Bada@ucdenver.edu.
³ Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, USA. William.Baumgartner@ucdenver.edu.
⁴ Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, USA. Larry.Hunter@ucdenver.edu.

PMID: 25903923
PMCID: PMC4448321
DOI: 10.1186/s12859-015-0559-3

Abstract

Background: The ability to query many independent biological databases using a common ontology-based semantic model would facilitate deeper integration and more effective utilization of these diverse and rapidly growing resources. Despite ongoing work moving toward shared data formats and linked identifiers, significant problems persist in semantic data integration in order to establish shared identity and shared meaning across heterogeneous biomedical data sources.

Results: We present five processes for semantic data integration that, when applied collectively, solve seven key problems. These processes include making explicit the differences between biomedical concepts and database records, aggregating sets of identifiers denoting the same biomedical concepts across data sources, and using declaratively represented forward-chaining rules to take information that is variably represented in source databases and integrating it into a consistent biomedical representation. We demonstrate these processes and solutions by presenting KaBOB (the Knowledge Base Of Biomedicine), a knowledge base of semantically integrated data from 18 prominent biomedical databases using common representations grounded in Open Biomedical Ontologies. An instance of KaBOB with data about humans and seven major model organisms can be built using on the order of 500 million RDF triples. All source code for building KaBOB is available under an open-source license.

Conclusions: KaBOB is an integrated knowledge base of biomedical data representationally based in prominent, actively maintained Open Biomedical Ontologies, thus enabling queries of the underlying data in terms of biomedical concepts (e.g., genes and gene products, interactions and processes) rather than features of source-specific data schemas or file formats. KaBOB resolves many of the issues that routinely plague biomedical researchers intending to work with data from multiple data sources and provides a platform for ongoing data integration and development and for formal reasoning over a wealth of integrated biomedical data.

PubMed Disclaimer

Figures

**Figure 1**
KaBOB Construction. Depicts the incremental construction of KaBOB. Labeled arrows represent processes that flow from inputs to outputs. Construction starts with downloading files and flows through translating them into RDF and then iteratively querying and producing more RDF. Steps marked with ** involve multiple sets of rules being run and their output loaded in sequence.

**Figure 2**
Example ICE Records and corresponding BIO Concepts. Depicts an excerpt of the knowledge representation in KaBOB. Ovals are used to depict instances, and rectangles classes. Single line arrows represent triples and point from their subject to their object and are labeled with their property. The iao:denotes links that cross from the ICE to the BIO side are emphasized with dashed arrows. The double arrows are shorthand for representing an owl:Restriction on the given property with some values from the object value. This figure depicts two GO annotation records that are then converted to biomedical concepts using the same rule (rule not depicted). Additionally sets of gene identifiers are also depicted that denote their corresponding gene concept. On the BIO side the relations between genes, proteins, and gene or gene product aggregate classes are also shown. Other than the records and their field values, generated by the file parsers, all other links are the output of applying rules.

See this image and copyright information in PMC

Cited by

Establishing a consensus for the hallmarks of cancer based on gene ontology and pathway annotations.
Chen Y, Verbeek FJ, Wolstencroft K. Chen Y, et al. BMC Bioinformatics. 2021 Apr 6;22(1):178. doi: 10.1186/s12859-021-04105-8. BMC Bioinformatics. 2021. PMID: 33823788 Free PMC article.
CROssBAR: comprehensive resource of biomedical relations with knowledge graph representations.
Doğan T, Atas H, Joshi V, Atakan A, Rifaioglu AS, Nalbat E, Nightingale A, Saidi R, Volynkin V, Zellner H, Cetin-Atalay R, Martin M, Atalay V. Doğan T, et al. Nucleic Acids Res. 2021 Sep 20;49(16):e96. doi: 10.1093/nar/gkab543. Nucleic Acids Res. 2021. PMID: 34181736 Free PMC article.
SMASH: A Data-driven Informatics Method to Assist Experts in Characterizing Semantic Heterogeneity among Data Elements.
Brown W 3rd, Weng C, Vawdrey DK, Carballo-Diéguez A, Bakken S. Brown W 3rd, et al. AMIA Annu Symp Proc. 2017 Feb 10;2016:1717-1726. eCollection 2016. AMIA Annu Symp Proc. 2017. PMID: 28269930 Free PMC article.
The Monarch Initiative: an integrative data and analytic platform connecting phenotypes to genotypes across species.
Mungall CJ, McMurry JA, Köhler S, Balhoff JP, Borromeo C, Brush M, Carbon S, Conlin T, Dunn N, Engelstad M, Foster E, Gourdine JP, Jacobsen JO, Keith D, Laraway B, Lewis SE, NguyenXuan J, Shefchek K, Vasilevsky N, Yuan Z, Washington N, Hochheiser H, Groza T, Smedley D, Robinson PN, Haendel MA. Mungall CJ, et al. Nucleic Acids Res. 2017 Jan 4;45(D1):D712-D722. doi: 10.1093/nar/gkw1128. Epub 2016 Nov 29. Nucleic Acids Res. 2017. PMID: 27899636 Free PMC article.
Therapeutic futility and phenotypic heterogeneity in heart failure with preserved ejection fraction: what is the role of bionic learning?
Kao D, Purohit S, Jhund P. Kao D, et al. Eur J Heart Fail. 2020 Jan;22(1):159-161. doi: 10.1002/ejhf.1658. Epub 2019 Nov 20. Eur J Heart Fail. 2020. PMID: 31749260 Free PMC article. No abstract available.

See all "Cited by" articles

References

1. Galperin MY, Rigden DJ, Fernández Suárez XM. The 2015 nucleic acids research database issue and molecular biology database collection. Nucleic Acids Res. 2015;43:D1–5. doi: 10.1093/nar/gku1241. - DOI - PMC - PubMed
1. Goble C, Stevens R. State of the nation in data integration for bioinformatics. J Biomed Inform. 2008;41(5):687–93. doi: 10.1016/j.jbi.2008.01.008. - DOI - PubMed
1. Good BM, Wilkinson MD. The life sciences semantic Web is full of creeps! Brief Bioinform. 2006;7:275–86. doi: 10.1093/bib/bbl025. - DOI - PubMed
1. Jain P, Hitzler P, Yeh PZ, Verma K, Sheth AP, Linked Data Is Merely More Data. In: Dan Brickley, Vinay K. Chaudhri, Harry Halpin, and Deborah McGuinness: Linked Data Meets Artificial Intelligence. Technical Report SS-10-07, AAAI Press, Menlo Park, California, 2010, pp.82-86. ISBN 978-1-57735-461-1
1. Hitzler, P. Towards reasoning pragmatics. In: Janowicz, K., Raubal, M., Levashkin, S. (eds.) GeoSpatial Semantics, Third International Conference, GeoS 2009, Mexico City, Mexico, December 3–4, 2009. Proceedings. pp. 9–25. Lecture Notes in Computer Science, Springer (2009)

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

R01 LM009254/LM/NLM NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

[1] Galperin MY, Rigden DJ, Fernández Suárez XM. The 2015 nucleic acids research database issue and molecular biology database collection. Nucleic Acids Res. 2015;43:D1–5. doi: 10.1093/nar/gku1241. - DOI - PMC - PubMed

[2] Galperin MY, Rigden DJ, Fernández Suárez XM. The 2015 nucleic acids research database issue and molecular biology database collection. Nucleic Acids Res. 2015;43:D1–5. doi: 10.1093/nar/gku1241. - DOI - PMC - PubMed

[3] Goble C, Stevens R. State of the nation in data integration for bioinformatics. J Biomed Inform. 2008;41(5):687–93. doi: 10.1016/j.jbi.2008.01.008. - DOI - PubMed

[4] Goble C, Stevens R. State of the nation in data integration for bioinformatics. J Biomed Inform. 2008;41(5):687–93. doi: 10.1016/j.jbi.2008.01.008. - DOI - PubMed

[5] Good BM, Wilkinson MD. The life sciences semantic Web is full of creeps! Brief Bioinform. 2006;7:275–86. doi: 10.1093/bib/bbl025. - DOI - PubMed

[6] Good BM, Wilkinson MD. The life sciences semantic Web is full of creeps! Brief Bioinform. 2006;7:275–86. doi: 10.1093/bib/bbl025. - DOI - PubMed

[7] Jain P, Hitzler P, Yeh PZ, Verma K, Sheth AP, Linked Data Is Merely More Data. In: Dan Brickley, Vinay K. Chaudhri, Harry Halpin, and Deborah McGuinness: Linked Data Meets Artificial Intelligence. Technical Report SS-10-07, AAAI Press, Menlo Park, California, 2010, pp.82-86. ISBN 978-1-57735-461-1

[8] Jain P, Hitzler P, Yeh PZ, Verma K, Sheth AP, Linked Data Is Merely More Data. In: Dan Brickley, Vinay K. Chaudhri, Harry Halpin, and Deborah McGuinness: Linked Data Meets Artificial Intelligence. Technical Report SS-10-07, AAAI Press, Menlo Park, California, 2010, pp.82-86. ISBN 978-1-57735-461-1

[9] Hitzler, P. Towards reasoning pragmatics. In: Janowicz, K., Raubal, M., Levashkin, S. (eds.) GeoSpatial Semantics, Third International Conference, GeoS 2009, Mexico City, Mexico, December 3–4, 2009. Proceedings. pp. 9–25. Lecture Notes in Computer Science, Springer (2009)

[10] Hitzler, P. Towards reasoning pragmatics. In: Janowicz, K., Raubal, M., Levashkin, S. (eds.) GeoSpatial Semantics, Third International Conference, GeoS 2009, Mexico City, Mexico, December 3–4, 2009. Proceedings. pp. 9–25. Lecture Notes in Computer Science, Springer (2009)

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

KaBOB: ontology-based semantic integration of biomedical databases

Affiliations

KaBOB: ontology-based semantic integration of biomedical databases

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources