Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Mar 4;40(3):btae104.
doi: 10.1093/bioinformatics/btae104.

Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES): a method for populating knowledge bases using zero-shot learning

Affiliations

Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES): a method for populating knowledge bases using zero-shot learning

J Harry Caufield et al. Bioinformatics. .

Abstract

Motivation: Creating knowledge bases and ontologies is a time consuming task that relies on manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and are not able to populate arbitrarily complex nested knowledge schemas.

Results: Here we present Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES), a Knowledge Extraction approach that relies on the ability of Large Language Models (LLMs) to perform zero-shot learning and general-purpose query answering from flexible prompts and return information conforming to a specified schema. Given a detailed, user-defined knowledge schema and an input text, SPIRES recursively performs prompt interrogation against an LLM to obtain a set of responses matching the provided schema. SPIRES uses existing ontologies and vocabularies to provide identifiers for matched elements. We present examples of applying SPIRES in different domains, including extraction of food recipes, multi-species cellular signaling pathways, disease treatments, multi-step drug mechanisms, and chemical to disease relationships. Current SPIRES accuracy is comparable to the mid-range of existing Relation Extraction methods, but greatly surpasses an LLM's native capability of grounding entities with unique identifiers. SPIRES has the advantage of easy customization, flexibility, and, crucially, the ability to perform new tasks in the absence of any new training data. This method supports a general strategy of leveraging the language interpreting capabilities of LLMs to assemble knowledge bases, assisting manual knowledge curation and acquisition while supporting validation with publicly-available databases and ontologies external to the LLM.

Availability and implementation: SPIRES is available as part of the open source OntoGPT package: https://github.com/monarch-initiative/ontogpt.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Figure 1.
Figure 1.
Example schema. Boxes denote classes and arrows denote attributes whose range are classes (compound attributes). Crows feet above boxes denote multivalued attributes. Attributes whose ranges are primitives or value sets are shown within each box. Here, the top level container class ‘Recipe’ is composed of a label, description, categories, steps, and ingredients. Steps and ingredients are further decomposed into food items, quantities, etc.
Figure 2.
Figure 2.
Example of a portion of text to parse and a corresponding instantiation of the recipe schema from Fig. 1, using YAML syntax. Input text is truncated for brevity; the full input is available at https://github.com/monarch-initiative/ontogpt/blob/main/tests/input/cases/recipe-spaghetti.txt. In each attribute-value pair, the attribute is shown in bold, followed by a colon and then the value or values. For multivalued attributes, each list element value is indicated with a hyphen at the beginning of the line. Terminal elements that are value sets from ontologies and standards such as FOODON (Dooley et al. 2018), UCUM (Schadow et al. 1999), and DBPedia (Bizer et al. 2009) are shown here with their human-readable labels after the double-hash comment symbol. Dynamic elements are indicated via RDF blank node syntax (e.g. _:ChoppedOnion does not correspond to a named entity and serves as a placeholder).
Figure 3.
Figure 3.
Overview of the SPIRES approach. A knowledge schema and text containing instances defined in the schema are processed by OntoGPT, yielding a query for GPT-3 or newer, accessed through the OpenAI API. OntoGPT parses the result, grounding extracted instances with specific entries and terms retrieved from queries of databases and ontologies where possible. The final product is a set of structured data (instances and relationship) in the shapes defined by the schema. Icons by user Khoirin from the Noun Project (https://thenounproject.com/besticon/).
Figure 4.
Figure 4.
Flowchart depicting the SPIRES algorithm.

References

    1. Ateia S, Kruschwitz U. Is ChatGPT a biomedical expert? – exploring the Zero-Shot performance of current GPT models in biomedical tasks. In: CLEF 2023: Conference and Labs of the Evaluation Forum, Thessaloniki, Greece: CLEF Initiative, 2023.
    1. Babaei Giglou H, D’Souza J, Auer S.. LLMs4OL: large language models for ontology learning. In: The Semantic Web – ISWC 2023. Switzerland: Springer Nature, 2023, 408–27. 10.1007/978-3-031-47240-4 - DOI
    1. Bender EM, Gebru T, McMillan-Major A. et al. On the dangers of stochastic parrots: can language models be too big? In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, New York, NY, USA: Association for Computing Machinery, 2021, 610–23. ISBN 9781450383097. 10.1145/3442188.3445922 - DOI
    1. Bizer C, Lehmann J, Kobilarov G. et al. DBpedia – a crystallization point for the web of data. J Web Semant 2009;7:154–65. 10.1016/j.websem.2009.07.002 - DOI
    1. Brown EG, Wood L, Wood S.. The medical dictionary for regulatory activities (MedDRA). Drug Saf 1999;20:109–17. 10.2165/00002018-199920020-00002 - DOI - PubMed

Publication types