. 2019 Mar;14(3):639-702.

doi: 10.1038/s41596-018-0098-2.

Creation and analysis of biochemical constraint-based models using the COBRA Toolbox v.3.0

Laurent Heirendt^#¹, Sylvain Arreckx^#¹, Thomas Pfau², Sebastián N Mendoza^{3

4}, Anne Richelle⁵, Almut Heinken¹, Hulda S Haraldsdóttir¹, Jacek Wachowiak¹, Sarah M Keating⁶, Vanja Vlasov¹, Stefania Magnusdóttir¹, Chiam Yu Ng⁷, German Preciat¹, Alise Žagare¹, Siu H J Chan⁷, Maike K Aurich¹, Catherine M Clancy¹, Jennifer Modamio¹, John T Sauls⁸, Alberto Noronha¹, Aarash Bordbar⁹, Benjamin Cousins¹⁰, Diana C El Assal¹, Luis V Valcarcel¹¹, Iñigo Apaolaza¹¹, Susan Ghaderi¹, Masoud Ahookhosh¹, Marouen Ben Guebila¹, Andrejs Kostromins¹², Nicolas Sompairac¹³, Hoai M Le¹, Ding Ma¹⁴, Yuekai Sun¹⁵, Lin Wang⁷, James T Yurkovich¹⁶, Miguel A P Oliveira¹, Phan T Vuong¹, Lemmer P El Assal¹, Inna Kuperstein¹³, Andrei Zinovyev¹³, H Scott Hinton¹⁷, William A Bryant¹⁸, Francisco J Aragón Artacho¹⁹, Francisco J Planes¹¹, Egils Stalidzans¹², Alejandro Maass^{3

4}, Santosh Vempala¹⁰, Michael Hucka²⁰, Michael A Saunders¹⁴, Costas D Maranas⁷, Nathan E Lewis^{5

21}, Thomas Sauter², Bernhard Ø Palsson^{16

22}, Ines Thiele¹, Ronan M T Fleming^{23

24}

Affiliations

¹ Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg.
² Life Sciences Research Unit, University of Luxembourg, Belvaux, Luxembourg.
³ Center for Genome Regulation (Fondap 15090007), University of Chile, Santiago, Chile.
⁴ Mathomics, Center for Mathematical Modeling, University of Chile, Santiago, Chile.
⁵ Department of Pediatrics, University of California, San Diego, School of Medicine, La Jolla, CA, USA.
⁶ European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK.
⁷ Department of Chemical Engineering, The Pennsylvania State University, State College, PA, USA.
⁸ Department of Physics, and Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, CA, USA.
⁹ Sinopia Biosciences, San Diego, CA, USA.
¹⁰ Algorithms and Randomness Center, School of Computer Science, Georgia Institute of Technology, Atlanta, GA, USA.
¹¹ Biomedical Engineering and Sciences Department, TECNUN, University of Navarra, San Sebastián, Spain.
¹² Institute of Microbiology and Biotechnology, University of Latvia, Riga, Latvia.
¹³ Institut Curie, PSL Research University, Mines Paris Tech, Inserm, U900, Paris, France.
¹⁴ Department of Management Science and Engineering, Stanford University, Stanford, CA, USA.
¹⁵ Department of Statistics, University of Michigan, Ann Arbor, MI, USA.
¹⁶ Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA.
¹⁷ Utah State University Research Foundation, North Logan, UT, USA.
¹⁸ Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, UK.
¹⁹ Department of Mathematics, University of Alicante, Alicante, Spain.
²⁰ Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA.
²¹ Novo Nordisk Foundation Center for Biosustainability, University of California, San Diego, La Jolla, CA, USA.
²² Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet, Lyngby, Denmark.
²³ Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg. ronan.mt.fleming@gmail.com.
²⁴ Division of Systems Biomedicine and Pharmacology, Leiden Academic Centre for Drug Research, Faculty of Science, Leiden University, Leiden, The Netherlands. ronan.mt.fleming@gmail.com.

^# Contributed equally.

PMID: 30787451
PMCID: PMC6635304
DOI: 10.1038/s41596-018-0098-2

Creation and analysis of biochemical constraint-based models using the COBRA Toolbox v.3.0

Laurent Heirendt et al. Nat Protoc. 2019 Mar.

. 2019 Mar;14(3):639-702.

doi: 10.1038/s41596-018-0098-2.

Authors

Affiliations

¹ Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg.
² Life Sciences Research Unit, University of Luxembourg, Belvaux, Luxembourg.
³ Center for Genome Regulation (Fondap 15090007), University of Chile, Santiago, Chile.
⁴ Mathomics, Center for Mathematical Modeling, University of Chile, Santiago, Chile.
⁵ Department of Pediatrics, University of California, San Diego, School of Medicine, La Jolla, CA, USA.
⁶ European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK.
⁷ Department of Chemical Engineering, The Pennsylvania State University, State College, PA, USA.
⁸ Department of Physics, and Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, CA, USA.
⁹ Sinopia Biosciences, San Diego, CA, USA.
¹⁰ Algorithms and Randomness Center, School of Computer Science, Georgia Institute of Technology, Atlanta, GA, USA.
¹¹ Biomedical Engineering and Sciences Department, TECNUN, University of Navarra, San Sebastián, Spain.
¹² Institute of Microbiology and Biotechnology, University of Latvia, Riga, Latvia.
¹³ Institut Curie, PSL Research University, Mines Paris Tech, Inserm, U900, Paris, France.
¹⁴ Department of Management Science and Engineering, Stanford University, Stanford, CA, USA.
¹⁵ Department of Statistics, University of Michigan, Ann Arbor, MI, USA.
¹⁶ Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA.
¹⁷ Utah State University Research Foundation, North Logan, UT, USA.
¹⁸ Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, UK.
¹⁹ Department of Mathematics, University of Alicante, Alicante, Spain.
²⁰ Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA.
²¹ Novo Nordisk Foundation Center for Biosustainability, University of California, San Diego, La Jolla, CA, USA.
²² Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet, Lyngby, Denmark.
²³ Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg. ronan.mt.fleming@gmail.com.
²⁴ Division of Systems Biomedicine and Pharmacology, Leiden Academic Centre for Drug Research, Faculty of Science, Leiden University, Leiden, The Netherlands. ronan.mt.fleming@gmail.com.

^# Contributed equally.

PMID: 30787451
PMCID: PMC6635304
DOI: 10.1038/s41596-018-0098-2

Abstract

Constraint-based reconstruction and analysis (COBRA) provides a molecular mechanistic framework for integrative analysis of experimental molecular systems biology data and quantitative prediction of physicochemically and biochemically feasible phenotypic states. The COBRA Toolbox is a comprehensive desktop software suite of interoperable COBRA methods. It has found widespread application in biology, biomedicine, and biotechnology because its functions can be flexibly combined to implement tailored COBRA protocols for any biochemical network. This protocol is an update to the COBRA Toolbox v.1.0 and v.2.0. Version 3.0 includes new methods for quality-controlled reconstruction, modeling, topological analysis, strain and experimental design, and network visualization, as well as network integration of chemoinformatic, metabolomic, transcriptomic, proteomic, and thermochemical data. New multi-lingual code integration also enables an expansion in COBRA application scope via high-precision, high-performance, and nonlinear numerical optimization solvers for multi-scale, multi-cellular, and reaction kinetic modeling, respectively. This protocol provides an overview of all these new features and can be adapted to generate and analyze constraint-based models in a wide variety of scenarios. The COBRA Toolbox v.3.0 provides an unparalleled depth of COBRA methods.

PubMed Disclaimer

Conflict of interest statement

COMPETING FINANCIAL INTERESTS The authors declare that they have no competing financial interests.

Figures

**Figure 1:**
Overview of key constraint-based reconstruction and analysis concepts. a. A genome-scale metabolic reconstruction is a structured knowledge-base that abstracts pertinent information on the biochemical transformations taking place within a chosen biochemical system, e.g., the human gut microbiome. Genome-scale metabolic reconstructions are built in two steps. First, several platforms exist for the generation of a draft metabolic reconstruction based on genome annotations. Second, the draft reconstructions need to be refined based on known experimental and biochemical data from literature. Novel experiments can be performed on the organism and the reconstruction refined accordingly. b. A phenotypically feasible solution space is defined by specifying certain assumptions, e.g., a steady-state assumption, then converting the reconstruction into computational model that eliminates physicochemically or biochemically infeasible network states. Various methods are used to interrogate the solution space. For example, optimisation for a biologically motivated objective function (e.g. biomass production) identifies a single optimal flux vector, whereas uniform sampling provides an unbiased characterisation via flux vectors uniformly distributed in the solution space. c Flux balance analysis is an optimization method that maximizes a linear objective function, $ψ (υ) = c^{T} υ$ , formed by multiplying every reaction flux υ_j with a predetermined coefficient, c_j, subject to a steady state assumption, *S_υ* = 0, as well as lower and upper bounds on each reaction flux (lb_j and ub_j, respectively).

**Figure 2:**
Continuous integration of new code (submitted by developers) is performed on a dedicated server running Jenkins (https://jenkins.io). The main code is located in the *src* folder and tests functions in the *test* folder. A test not only runs a function (first degree testing), but tests the output of that function (second degree testing). The continuous integration setup relies on end-of-year releases of MATLAB only. Soon after the latest stable version of MATLAB is released, full support will be provided for the COBRA Toolbox. After a successful run of tests on the three latest end-of-year releases of MATLAB using various solver packages, the documentation based on the headers of the functions (docstrings) is extracted, generated, and automatically deployed. Immediate feedback through code coverage reports (https://codecov.io/gh/opencobra/cobratoolbox) and build statuses are reported on GitHub. With this setup, the impact of local changes in the code base is promptly revealed.

**Figure 3:**
Conceptual overview of the main steps involved in the unsteady-state flux balance analysis (uFBA) method.

**Figure 4:**
Solution spaces from steady state fluxes are anisotropic, that is, long in some directions and short in others. This impedes the ability of any sampling algorithm taking a random direction to evenly explore the full feasible set (*artificial centering hit-and-run* (ACHR) algorithm). The CHRR (*coordinate hit-and-run with rounding*) algorithm first rounds the solution space based on the maximum volume ellipsoid. Then, the rounded solution space is uniformly sampled using a provably efficient coordinate hit-and-run random walk. Finally, the samples are projected back onto the anisotropic feasible set. This leads to a more distributed uniform sampling, so that the converged sampling distributions for the selected reactions become smoother. As an example, for both sampling distributions, the parameters were defined as: *nSkip* = 8 × (*dim*(*fluxSpace*)), *nSarnples* = 1000.

**Figure 5:**
In the OptForce procedure, the MUST sets are determined by contrasting the flux ranges obtained using flux variability analysis (FVA) of a wild-type (blue bars) and an overproducing strain (red bars). The first order MUST sets (top panel) are denoted MUST^L and MUST^U. For instance, a reaction belongs to the MUST^U set if the upper bound of the flux range in the wild-type is less than the lower bound of the flux range of the overproducing strain. The center and bottom panels show all possible second order MUST sets.

**Figure 6:**
The openCOBRA repository and the fork of a contributor located on the Github server can be cloned to the local computer as *cobratoolbox* and *fork-cobratoolbox* folders, respectively. Each repository might contain different branches, but each repository contains the *master* and *develop* branches. Note that contributors only have read on the openCOBRA repository. The stable branch is the *master* branch (black branch), while the development of code is made on the *develop* branch (green branch). The *master* branch shall be checked out when using the *cobratoolbox* repository, whereas contributors shall create new branches originating from the *develop* branch (local *fork-cobratoolbox* directory and online *<username>/cobratoolbox* repository). In the present example, *myBranch1* (blue branch) has already been pushed to the forked repository on the Github server, while *myBranch2* (pink branch) is only present locally. The branch *myBranch1* may be merged into the *develop* branch of the openCOBRA repository through opening a pull request. In order to submit the contributions (commits) on *myBranch2,* the contributor must first push the commits to the forked repository (https://github.com/<username>/cobratoolbox) before opening a pull request. Any commit made on the *develop* branch (red square) will be merged to the *master* branch if the *develop* branch is stable overall (orange square).

**Figure 7:**
Output of initialisation of the COBRA Toolbox with initCobraToolbox.

**Figure 8:. An energy generating stoichiometrically balanced cycle.**
The smallest stoichiometrically balanced cycle that produces ATP at a maximal rate using the ATP synthase reaction, in Recon3D, with all internal reactions. All metabolite and reaction abbreviations are primary keys in the Virtual Metabolic Human database (https://vmh.uni.lu): reaction abbreviation, reaction name: ADK1m, adenylate kinase, mitochondrial; G5SDym, glutamate-5-semialdehyde dehydrogenase, mitochondrial; GLU5Km, glutamate 5-kinase, mitochondrial; P45027A15m, 5-beta-cytochrome P450, family 27, subfamily A, polypeptide 1; PPAm, inorganic diphosphatase; r0074, L-glutamate 5-semialdehyde:NAD+ oxidoreductase; HMR_3966, nucleoside-triphosphate giphosphatase; ATPS4mi, ATP synthase (four protons for one ATP); CYOR_u10mi, ubiquinol-6 cytochrome c reductase, Complex III; NADH2_u10mi, NADH dehydrogenase, mitochondrial; CYOOm2i, cytochrome c oxidase, mitochondrial complex IV.

**Figure 9:**
The interventions predicted by the OptForce method for succinate overproduction in *E. coli* (AntCore model) under aerobic conditions. Reactions that need to be up-regulated (green arrows and labels) and knocked out (red arrows and labels) are shown in this simplified metabolic map. The strategies include up-regulation of reactions generating succinate such as isocitrate dehydrogenase (R2), α-ketoglutarate dehydrogenase or succinyl-CoA synthetase, along with knockout of reactions draining succinate such as succinate dehydrogenase or fumarate hydratase. Note that each of these reactions may associate with one or more genes in *E. coli.*

**Figure 10:. Qualitatively forward, quantitatively reverse reactions in a multi-compartmental, genome-scale model.**
In Recon3D, the transformed reaction Gibbs energy could be estimated for 7, 215 reactions. Of these reactions, 2, 868 reactions were qualitatively assigned to be forward in the reconstruction, but were quantitatively assigned to be reversible using subcellular compartment specific thermodynamic parameters, the component contribution method, and broad bounds on metabolite concentrations (10⁻⁵ – 0.02 mol/L), except for certain cofactors. The geometric mean (green) and feasible range (between maximum and minimum) of estimated millimolar standard transformed reaction Gibbs energy (A_r G’^m, blue) and transformed reaction Gibbs energy (Δ_rG′^m, red) are illustrated. The relative uncertainty in metabolite concentrations versus uncertainty in thermochemical estimates is reflected by the relative breadth of the red and blue bars for each reaction, respectively. The reactions are rank ordered by the cumulative probability that millimolar standard transformed reaction Gibbs energy is less than zero, P(Δ_rG′^m < 0), (black descending line from left to right). This assumes that all metabolites are at a millimolar concentration *(1mM*) and a Gaussian error is assumed in component contribution estimates. In this ordering, forward transport reactions have P(Δ_rG′^m < 0) = 1 (far left) and reverse transport reactions have P(Δ_rG′^m < 0) = 0 (far right). In between, from left to right are biochemical reactions with decreasing cumulative probability of being forward in direction, subject to the stated assumptions. Alternative rankings are possible. The key point is to observe that the COBRA Toolbox is primed for quantitative integration of metabolomic data as the uncertainty in transformed reaction Gibbs energy associated with thermochemical estimates using the component contribution method is now significantly lower than the uncertainty associated with the assumption of broad concentration range.

**Figure 11:**
Overlay of the flux vector for maximum ATP synthase flux, using flux balance analysis with regularisation of the flux vector. Active fluxes are highlighted (purple).

**Figure 12:**
Selective scope visualisation of the *E. coli* core model model by Paint4Net. Rectangles represent reactions with rates of fluxes in brackets; the red rectangles represent reactions with only one metabolite; ellipses represent metabolites; the red ellipses represent dead end metabolites; grey edges represent zero-rate fluxes; green edges represent positive-rate (forward) fluxes; and blue edges represent negative-rate (backward) fluxes. Network visualisation also enables zoom to specific regions.

See this image and copyright information in PMC

References

1. Palsson BØ Systems Biology: Constraint-based Reconstruction and Analysis. Cambridge University Press, Cambridge, England, January (2015).
1. O’Brien EJ, Monk JM, and Palsson BO Using Genome-scale Models to Predict Biological Capabilities. Cell 161(5), 971–987, May (2015). - PMC - PubMed
1. Becker SA, Feist AM, Mo ML, Hannum G, Palsson BØ, et al. Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox. Nat. Protocols 2(3), 727–738, March (2007). - PubMed
1. Schellenberger J, Que R, Fleming RMT, Thiele I, Orth JD, et al. Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox v2.0. Nat. Protocols 6(9), 1290–1307, September (2011). - PMC - PubMed
1. Lewis NE, Nagarajan H, and Palsson BO Constraining the metabolic genotype–phenotype relationship using a phylogeny of in silico methods. Nature Reviews Microbiology 10, February (2012). - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Creation and analysis of biochemical constraint-based models using the COBRA Toolbox v.3.0

Affiliations

Creation and analysis of biochemical constraint-based models using the COBRA Toolbox v.3.0

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases