Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Apr 14;24(8):7296.
doi: 10.3390/ijms24087296.

Automatic Generation of SBML Kinetic Models from Natural Language Texts Using GPT

Affiliations

Automatic Generation of SBML Kinetic Models from Natural Language Texts Using GPT

Kazuhiro Maeda et al. Int J Mol Sci. .

Abstract

Kinetic modeling is an essential tool in systems biology research, enabling the quantitative analysis of biological systems and predicting their behavior. However, the development of kinetic models is a complex and time-consuming process. In this article, we propose a novel approach called KinModGPT, which generates kinetic models directly from natural language text. KinModGPT employs GPT as a natural language interpreter and Tellurium as an SBML generator. We demonstrate the effectiveness of KinModGPT in creating SBML kinetic models from complex natural language descriptions of biochemical reactions. KinModGPT successfully generates valid SBML models from a range of natural language model descriptions of metabolic pathways, protein-protein interaction networks, and heat shock response. This article demonstrates the potential of KinModGPT in kinetic modeling automation.

Keywords: GPT; kinetic modeling; large language model; simulation; systems biology.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest. The funders had no role in the design of the study, in the collection, analyses, or interpretation of data, in the writing of the manuscript, or in the decision to publish the results.

Figures

Scheme A1
Scheme A1
Instruction message for the GPT-only approach. This instruction is followed by a model description (see Figure 1).
Scheme A2
Scheme A2
Instruction message for KinModGPT. This instruction is followed by a model description (see Figure 1).
Scheme A3
Scheme A3
Full model description for the heat shock response model. As the model description exceeds the maximal length per prompt for text-davinci-003 and gpt-3.5-turbo, we split it into two with a blank line.
Scheme A4
Scheme A4
The first half of the heat shock response model in Antimony language, created by KinModGPT with text-davinci-003.
Scheme A5
Scheme A5
The second half of the heat shock response model in Antimony language, created by KinModGPT with text-davinci-003.
Figure 1
Figure 1
Test problems. We tested whether KinModGPT can create SBML models from the natural language model descriptions. For the reaction network maps, CADLIVE notation was used [9,10,11]. For simplicity, only important reactions are shown in the reaction network map for the heat shock response model. The complete model description for the heat shock response is provided in Scheme A3.
Figure 2
Figure 2
Overview of KinModGPT.
Scheme 1
Scheme 1
The decay model in Antimony language, created by KinModGPT with text-davinci-003.
Figure 3
Figure 3
Simulation of the SBML model for the decay model. This model was created by KinModGPT with text-davinci-003.
Scheme 2
Scheme 2
The HIV model in Antimony language, created by KinModGPT with text-davinci-003.
Figure 4
Figure 4
Simulation of the created SBML model for the HIV model. (a) Stotal and Ptotal represent the total S concentration (Stotal=S+ES) and the total P concentration (Ptotal=P+EP), respectively. (b) E, ES, EP, EI, and EJ represent the enzyme, enzyme-substrate complex, enzyme-product complex, enzyme-inhibitor complex, irreversible enzyme-inhibitor complex, respectively. Not all variables are shown, for clarity. This model was created by KinModGPT with text-davinci-003. We tuned the kinetic parameters before the simulation.
Scheme 3
Scheme 3
The three-step model in Antimony language, created by KinModGPT with text-davinci-003.
Figure 5
Figure 5
Simulation of the created SBML model for the three-step model. (a) S, M1, M2, and P represent the substrate, first intermediate metabolite, second intermediate metabolite, and product, respectively. (b) E1, E2, and E3 represent the first, second, and third enzymes, respectively. This model was created by KinModGPT with text-davinci-003.
Figure 6
Figure 6
Simulation of the created SBML model for the heat shock response model. (a) Yield, (b) total σ32, and (c) total DnaK. Heat shock occurs at 0 min and is implemented through an increase in the rate constant for protein denaturing. Yield is the fraction of folded proteins in a pool of total proteins, i.e., Yield=Pfold/(Pfold+Punfold+Punfold_DnaK). Not all variables are shown, for clarity. This model was created by KinModGPT with text-davinci-003. We tuned the kinetic parameters before the simulation.

Similar articles

References

    1. Kitano H. Systems biology: A brief overview. Science. 2002;295:1662–1664. doi: 10.1126/science.1069492. - DOI - PubMed
    1. Hucka M., Finney A., Sauro H.M., Bolouri H., Doyle J.C., Kitano H., Arkin A.P., Bornstein B.J., Bray D., Cornish-Bowden A., et al. The systems biology markup language (SBML): A medium for representation and exchange of biochemical network models. Bioinformatics. 2003;19:524–531. doi: 10.1093/bioinformatics/btg015. - DOI - PubMed
    1. Keating S.M., Waltemath D., Konig M., Zhang F., Drager A., Chaouiya C., Bergmann F.T., Finney A., Gillespie C.S., Helikar T., et al. SBML Level 3: An extensible format for the exchange and reuse of biological models. Mol. Syst. Biol. 2020;16:e9110. doi: 10.15252/msb.20199110. - DOI - PMC - PubMed
    1. Choi K., Medley J.K., Konig M., Stocking K., Smith L., Gu S., Sauro H.M. Tellurium: An extensible python-based modeling environment for systems and synthetic biology. Biosystems. 2018;171:74–79. doi: 10.1016/j.biosystems.2018.07.006. - DOI - PMC - PubMed
    1. Medley J.K., Choi K., Konig M., Smith L., Gu S., Hellerstein J., Sealfon S.C., Sauro H.M. Tellurium notebooks—An environment for reproducible dynamical modeling in systems biology. PLoS Comput. Biol. 2018;14:e1006220. doi: 10.1371/journal.pcbi.1006220. - DOI - PMC - PubMed

LinkOut - more resources