. 2019 Jan 18:8:e36842.

doi: 10.7554/eLife.36842.

Essential metabolism for a minimal cell

Marian Breuer¹, Emmy E Earnest¹, Chuck Merryman², Kim S Wise², Lijie Sun², Michaela R Lynott², Clyde A Hutchison², Hamilton O Smith², John D Lapek³, David J Gonzalez³, Valérie de Crécy-Lagard⁴, Drago Haas⁴, Andrew D Hanson⁵, Piyush Labhsetwar¹, John I Glass², Zaida Luthey-Schulten¹

Affiliations

¹ Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, United States.
² J Craig Venter Institute, La Jolla, United States.
³ Department of Pharmacology and School of Pharmacy, University of California at San Diego, La Jolla, United States.
⁴ Department of Microbiology and Cell Science, University of Florida, Gainesville, United States.
⁵ Horticultural Sciences Department, University of Florida, Gainesville, United States.

PMID: 30657448
PMCID: PMC6609329
DOI: 10.7554/eLife.36842

Essential metabolism for a minimal cell

Marian Breuer et al. Elife. 2019.

. 2019 Jan 18:8:e36842.

doi: 10.7554/eLife.36842.

Authors

Affiliations

¹ Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, United States.
² J Craig Venter Institute, La Jolla, United States.
³ Department of Pharmacology and School of Pharmacy, University of California at San Diego, La Jolla, United States.
⁴ Department of Microbiology and Cell Science, University of Florida, Gainesville, United States.
⁵ Horticultural Sciences Department, University of Florida, Gainesville, United States.

PMID: 30657448
PMCID: PMC6609329
DOI: 10.7554/eLife.36842

Abstract

JCVI-syn3A, a robust minimal cell with a 543 kbp genome and 493 genes, provides a versatile platform to study the basics of life. Using the vast amount of experimental information available on its precursor, Mycoplasma mycoides capri, we assembled a near-complete metabolic network with 98% of enzymatic reactions supported by annotation or experiment. The model agrees well with genome-scale in vivo transposon mutagenesis experiments, showing a Matthews correlation coefficient of 0.59. The genes in the reconstruction have a high in vivo essentiality or quasi-essentiality of 92% (68% essential), compared to 79% in silico essentiality. This coherent model of the minimal metabolism in JCVI-syn3A at the same time also points toward specific open questions regarding the minimal genome of JCVI-syn3A, which still contains many genes of generic or completely unclear function. In particular, the model, its comparison to in vivo essentiality and proteomics data yield specific hypotheses on gene functions and metabolic capabilities; and provide suggestions for several further gene removals. In this way, the model and its accompanying data guide future investigations of the minimal cell. Finally, the identification of 30 essential genes with unclear function will motivate the search for new biological mechanisms beyond metabolism.

Keywords: JCVI-syn3A; computational biology; gene essentiality; metabolic reconstruction; mycoplasma; proteomics; systems biology; transposon mutagenesis.

Plain language summary

One way that researchers can test whether they understand a biological system is to see if they can accurately recreate it as a computer model. The more they learn about living things, the more the researchers can improve their models and the closer the models become to simulating the original. In this approach, it is best to start by trying to model a simple system. Biologists have previously succeeded in creating ‘minimal bacterial cells’. These synthetic cells contain fewer genes than almost all other living things and they are believed to be among the simplest possible forms of life that can grow on their own. The minimal cells can produce all the chemicals that they need to survive – in other words, they have a metabolism. Accurately recreating one of these cells in a computer is a key first step towards simulating a complete living system. Breuer et al. have developed a computer model to simulate the network of the biochemical reactions going on inside a minimal cell with just 493 genes. By altering the parameters of their model and comparing the results to experimental data, Breuer et al. explored the accuracy of their model. Overall, the model reproduces experimental results, but it is not yet perfect. The differences between the model and the experiments suggest new questions and tests that could advance our understanding of biology. In particular, Breuer et al. identified 30 genes that are essential for life in these cells but that currently have no known purpose. Continuing to develop and expand models like these to reproduce more complex living systems provides a tool to test current knowledge of biology. These models may become so advanced that they could predict how living things will respond to changing situations. This would allow scientists to test ideas sooner and make much faster progress in understanding life on Earth. Ultimately, these models could one day help to accelerate medical and industrial processes to save lives and enhance productivity.

PubMed Disclaimer

Conflict of interest statement

MB, EE, CM, KW, LS, ML, JL, DG, Vd, DH, AH, PL, JG, ZL No competing interests declared, CH is a consultant for Synthetic Genomics, Inc. (SGI), and holds SGI stock and/or stock options, HS is on the Board of Directors and cochief scientific officer of Synthetic Genomics, Inc. (SGI) and holds SGI stock and/or stock options

Figures

Figure 1.. Comparison of protein coding genes in the genomes of JCVI-syn3A (NCBI GenBank: https://www.ncbi.nlm.nih.gov/nuccore/CP016816.2 (Glass, 2017)), *M. pneumoniae* (NCBI GenBank: https://www.ncbi.nlm.nih.gov/nuccore/U00089.2 (Himmelreich et al., 2014)), and *E. coli* (NCBI GenBank: https://www.ncbi.nlm.nih.gov/nuccore/NC_012967.1 (Jeong et al., 2017)) with 452, 688, and 4637 coding genes, respectively.
Each color represents a primary functional class, each contiguous shaded region corresponds to a secondary functional class, within each of the shaded regions the bold lines separate tertiary functional classes, finally each polygonal cell represents a single gene. The functional class hierarchy is presented in Supplementary file 1A. The ratio of metabolic to genetic information processing genes—0.67, 0.79, and 2.23 respectively—is smallest for JCVI-syn3A. The JCVI-syn3A genome contains both the smallest absolute number of genes of unclear function and the smallest percentage, 91 (20 %), compared to *M. pneumoniae* with 311 (45 %) and *E. coli* with 1780 (38 %).

**Figure 2.. Classification of gene essentiality from transposon insertion data using a Poisson mixture model for a representative region of the JCVI-syn3A genome.**
Coding regions are colored by their predicted class: red (essential), yellow (quasi-essential), blue (non-essential). Lavender regions denote RNA and light brown regions are pseudogenes. The distributions of transposon insertions in passage 1 and passage 4 are represented by yellow and dark green histograms, respectively (bin size of 50 bp). The overlap of the two histograms is highlighted in blue. When a common gene name is not available, the four-digit locus tag for JCVI-syn1.0 is used instead. Locus number identifiers with the (3A) suffix represent newly identified open reading frames in JCVI-syn3A which are missing from the JCVI-syn1.0 annotation. Asterisks mark genes with unknown functionality.

**Figure 2—figure supplement 1.. Classification of gene essentiality from transposon insertion data using a Poisson mixture model for 0–275,000 bp.**
Coding regions are colored by their predicted class: red (essential), yellow (quasi-essential), blue (non-essential). Lavender regions denote RNA, light brown regions are pseudogenes, and green regions are markers used to construct and implant the genome. The distributions of transposon insertions in passage 1 and passage 4 are represented by yellow and dark green histograms respectively (bin size of 50 bp). The overlap of the two histograms is highlighted in blue. When a common gene name is not available, the four-digit locus tag for JCVI-syn1.0 is used instead. Locus number identifiers with the (3A) suffix represent represent newly identified open reading frames in JCVI-syn3A which are missing from the JCVI-syn1.0 annotation. Asterisks mark genes with unknown functionality.

**Figure 2—figure supplement 2.. Classification of gene essentiality from transposon insertion data using a Poisson mixture model for 275,000–543,379 bp.**

**Figure 2—figure supplement 3.. Distribution of transposon insertion counts for P1 (panel a) and P4 (panel b) compared to the distribution inferred through the Poisson mixture model.**
To separate genes labeled ‘non-essential’ by the mixture model, but that showed a significant decrease in insertion counts from $P_{1}$ to $P_{4}$ , $k$ -means clustering was used on the ratios of transposon insertion rates in $P_{1}$ and $P_{4}$ for the genes labeled ‘non-essential’. Panel c shows how the genes were divided into two clusters such that the first cluster (blue) contains quasi-essential genes and the second contains truly non-essential genes.

**Figure 3.. Essential, quasi-essential, and non-essential protein coding genes in JCVI-syn3A across four functional classes.**
(a) Distribution across genome (cell areas all equal). (b) Distribution across proteome (cell areas proportional to protein copy number in an average cell). Among non-essential proteins, the three most abundant ones are *ftsZ/*0522, the peptidase 0305 and 0538 (unclear function). A detailed breakdown of the JCVI-syn3A genome into these classes is available in Table 1.

**Figure 4.. Biomass reaction equation for JCVI-syn3A.**
This reaction consumes biomass precursors (macromolecules, lipids, capsule, small molecules) (black) and consumes energy in the form of ATP (red) to produce biomass (blue). Values in parentheses are the stoichiometric coefficients in mmol compound per gram cellular dry weight (mmol gDW⁻¹). The macromolecular compositions are highlighted in green (stoichiometric coefficients within the macromolecule, unitless) and the compositions of lipids and small molecule pools are highlighted in gray (mmol gDW⁻¹). ATP expenses within green boxes denote total macromolecular synthesis costs (based on macromolecular fractions in the biomass) and the ATP expense in the main equation denotes the nonquantifiable part of the growth-associated maintenance cost (GAM; see Section 'GAM/NGAM').

**Figure 5.. Overview of the metabolic reconstruction of JCVI-syn3A, drawn with Escher (King et al., 2015).**
Orange nodes represent metabolites, labeled by their short names in the model (black); the suffixes ‘_c’ and ‘_e’ denote cytoplasmic and extracellular compartments, respectively. For clarity, H₂O, H⁺, PP_i and P_i are generally omitted as reactants. Blue edges represent (enzymatic or spontaneous) reactions, labeled by reaction name (gray labels) and associated gene loci (gene-protein-reaction (GPR) rules, turquoise; omitting ‘MMSYN1_’ prefix). Blue parenthesized numbers denote reactants (products) which are consumed (produced) in stoichiometric quantities greater than one. In this map and subsequent maps, the following color scheme for highlighted reactions is used—blue: reaction based on new annotation, light green: reaction based on suggested annotation refinement, cyan: specific reaction assumed for generic annotation, light violet: non-enzymatic reaction, orange: reaction not accounted for by gene yet but supported by experimental evidence, and red: reaction included based on gap filling. Small boxes list metabolites that can be taken up (green boxes) or secreted (brown boxes) under physiological conditions.

**Figure 6.. Central metabolism in JCVI-syn3A.**
Map components and labels as in Figure 5. Big arrows denote incoming or outgoing connections to other parts of the metabolic network. For context, the node representing glucose transport has been labeled explicitly and glycolysis has been highlighted in gray.

**Figure 6—figure supplement 1.. Steady-state fluxes through central metabolism in JCVI-syn3A.**
Map components and labels as in Figure 5, with gene loci/gene-protein-reaction rules omitted. Numbers after reaction labels denote steady-state reaction fluxes in mmol gDW⁻¹ h^-1; edge color corresponds to the absolute value of the carried flux—gray to blue to purple to red, from low to high flux. For reversible reactions, the reaction progresses from the white to the filled arrowhead.

**Figure 7.. Nucleotide metabolism in JCVI-syn3A.**
Map components and labels as in Figure 5.

**Figure 8.. Apparent dead-end of dUMP/deoxyuridine and possible solutions.**
Internal metabolites are highlighted with cyan boxes, external ones with red boxes. Blue arrows denote reactions incorporated during model reconstruction—no reaction leads away from the dUMP/deoxyuridine pair. Red arrows denote hypothetical reactions that could possibly solve this dead-end. In the model, we have adopted the hypothetical CTP synthase reaction converting dUMP to dCMP (see also Figure 7; CTPSDUMP).

**Figure 9.. Cofactor metabolism in JCVI-syn3A.**
Map components and labels as in Figure 5.

**Figure 10.. Lipid and capsule metabolism in JCVI-syn3A.**
Map components and labels as in Figure 5.

**Figure 11.. Macromolecule metabolism in JCVI-syn3A.**
Map components and labels as in Figure 5. The detailed (amino acid-specific) stoichiometry of the protein synthesis and degradation reactions can be found in Supplementary file 4. Protein synthesis reactions for the proteins explicitly included in the model (apo-ACP, dUTPase and PdhC) are analogous to the translation reaction shown and are therefore not included in the map.

**Figure 12.. Amino acid metabolism in JCVI-syn3A.**
Map components and labels as in Figure 5. As amino acid metabolism in JCVI-syn3A constitutes sets of analogous reactions (for each amino acid or peptide), we use generic reactions in the upper right part of the map. The ABC importer Opp catalyzes tetrapeptide uptake reactions in the model ([amino acid]4abc in Supplementary file 4); the AA permeases (incl. GltP) catalyze amino acid proton symport reactions ([amino acid]t2[p]r in Supplementary file 4). The peptidases catalyze peptide hydrolysis reactions ([amino acid]4P in Supplementary file 4). The aminoacyl tRNA synthetases (‘aaRS’s’ in the map) catalyze charging of tRNAs ([amino acid]TRS in Supplementary file 4). Synthesis of Gln-tRNA_Gln requires transamidation of initially mischarged Glu-tRNA $^{Gln}$ and the corresponding reactions are shown on the lower left. In the $S$ -adenosylmethionine pathway on the lower right, we note that nucleic acid modification reactions (indicated by the edge labeled ‘DNA/RNA modification’) were not included in the model due to lack of sufficient information on kind and abundance of nucleic acid modifications in JCVI-syn3A.

**Figure 13.. Ion transport reactions in JCVI-syn3A.**
Map components and labels as in Figure 5.

**Figure 14.. Comparison of growth curves of JCVI-syn1.0 and JCVI-syn3A.**
JCVI-syn1.0 has a doubling time of 66 min (blue; ‘×' markers), whereas JCVI-syn3A has a doubling time of 105 min (orange; ‘+' markers). Doubling times ( $t_{d}$ ) were calculated as described in Section 'Materials and methods', plotting fluorescence staining of cellular DNA vs. time, fitted by exponential regression curves. The regression curves for JCVI-syn1.0 and JCVI-syn3A have $R^{2}$ values of 0.9986 and 0.9976, respectively.

**Figure 15.. Comparison of FBA steady-state fluxes ν to maximal fluxes Vmax obtained from protein abundances and turnover numbers from BRENDA and the literature.**
Map components and labels as in Figure 5, with reaction highlighting and gene loci/gene-protein-reaction rules omitted. Each edge is colored according to the ratio between $V_{max}$ and $ν$ : Blue indicates $V_{max} > ν$ , red indicates $V_{max} < ν$ and green indicates that no $V_{max}$ could be obtained (because of either missing turnover number or missing protein abundance; or because reaction is not enzymatic to begin with).

**Figure 15—figure supplement 1.. Statistics of FBA steady-state fluxes ν vs. maximal fluxes Vmax comparison (see Figure 15).**
(A) Summary of $V_{max}$ vs. $ν$ comparison over all 253 non-exchange reactions in the model. Red, blue, green: Meaning as in Figure 15. Green-striped: Subset of green set—reactions without $V_{max}$ that pertain to transport, which usually do not have an EC number associated with them. Gray: Reactions with $ν = 0$ in the FBA solution (thus $V_{max} > ν$ always fulfilled). B: Histogram of $V_{max} / ν$ over the blue and red subset in panel A.

Figure 16.. Partitioning of genes classified as essential, quasi-essential, and non-essential by transposon mutagenesis experiments into those which are in silico essential, in silico non-essential, and not modeled (‘Non-metabolic’).
All genes are included (i.e. also RNA genes and pseudogenes).

**Figure 16—figure supplement 1.. In silico double-gene knockouts between genes that are non-essential in single-gene knockouts.**
Among the individually non-essential genes, a double knockout of the gene pair (0876, 0878) is the only lethal combination (red). This knockout corresponds to simultaneously removing both amino acid permeases, thus preventing cysteine uptake. Simultaneous knockout of the glutamate/aspartate permease *gltP/*0886 and any Opp gene (*oppB/*0165 through *oppA/*0169) is non-lethal in silico, as the model will under these circumstances produce glutamate through the hypothesized dUMP breakdown reaction CTPSDUMP and, to a lesser extent, through the reaction CTPS2 (both catalyzed by *pyrG/*0129). Glutamate production through *pyrG/*0129 is not expected to be able to meet cellular demands in vivo. If flux through CTPSDUMP is set to zero in the model, a double knockout of *gltP/*0886 and Opp becomes lethal in silico.

**Figure 17.. Distributions of absolute protein abundances (number of molecules per average cell) in JCVI-syn3A.**
(a) Breakdown of the JCVI-syn3A proteome into functional classes. The area of each cell is proportional to its relative abundance. (**b,c**) Histograms of absolute protein abundances. (b) Absolute abundances of model-included metabolic proteins essential or non-essential in silico compared to all protein abundances. ‘Technical non-essential’ proteins are not included (see Section 'In silico gene knockouts and mapping to in vivo essentiality'). (c) Absolute abundances for proteins classified by in vivo essentiality from transposon mutagenesis experiments. (**d,e**) Exceedence plots of absolute abundances for proteins classified by in silico or in vivo essentiality. The exceedence at a given protein abundance value x is the fraction of the protein set displaying an abundance higher than x. (d) Model-included proteins (classified by in silico essentiality) compared to all proteins. (e) Proteins classified by in vivo (transposon-based) essentiality.

**Appendix 1—figure 1.. Thiamin diphosphate (ThDP) from the *Mycoplasma hyorhinis* Cypl crystal structure (pdb: 3EKI) overlaid onto the crystal structure of MG289 (pdb: 3MYU).**
(Structures aligned using STAMP (Russell and Barton, 1992) in VMD (Humphrey et al., 1996; Eargle et al., 2006).) a): Space-filling view, with MG289 in gray and ThDP in color. The pyrophosphate tail of ThDP from the Cypl structure would have an appropriate cavity in MG289 as well. b): Visualization of hydrogen bonds for the same alignment. All possible hydrogen bonds are shown between potential donor and acceptor heavy atoms within 3.5 Å or less of each other. Even in the absence of the residues involved in pyrophosphate binding in Cypl (Sippel et al., 2009), the alignment suggests other side group and backbone interactions could still allow for pyrophosphate binding.

**Appendix 1—figure 2.. Sensitivity analysis of model doubling time with respect to model constraints.**
In each panel, the stated parameter was varied over the indicated range and the model doubling time calculated while keeping all other constraints constant. (A:) Maximal glucose uptake. (B:) Maximal acetate secretion. (C:) ATPase ATP cost. D: GAM ATP cost. (E:) Protein degradation rate. (F:) RNA degradation rate. (G:) Imposed NADPH consumption. The blue circle marks the value used in the FBA model and resulting doubling time; the orange circle indicates the parameter that would yield the experimental doubling time. If there is no value of the parameter which would yield the experimental doubling time, a horizontal line is plotted.

See this image and copyright information in PMC

Comment in

doi: 10.7554/eLife.45379

References

1. Alberts B. A grand challenge in biology. Science. 2011;333:1200. doi: 10.1126/science.1213238. - DOI - PubMed
1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. Journal of Molecular Biology. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. - DOI - PubMed
1. Andrés E, Martínez N, Planas A. Expression and characterization of a Mycoplasma genitalium glycosyltransferase in membrane glycolipid biosynthesis: potential target against Mycoplasma infections. The Journal of Biological Chemistry. 2011;286:35367–35379. doi: 10.1074/jbc.M110.214148. - DOI - PMC - PubMed
1. Archer D. Modification of the membrane composition of Mycoplasma mycoides subsp. capri by the growth medium. Microbiology. 1975;88:329–338. - PubMed
1. Arora S, Bhamidimarri SP, Bhattacharyya M, Govindan A, Weber MH, Vishveshwara S, Varshney U. Distinctive contributions of the ribosomal P-site elements m2G966, m5C967 and the C-terminal tail of the S9 protein in the fidelity of initiation of translation in Escherichia coli. Nucleic Acids Research. 2013;41:4963–4975. doi: 10.1093/nar/gkt175. - DOI - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Essential metabolism for a minimal cell

Affiliations

Essential metabolism for a minimal cell

Authors

Affiliations

Abstract

Plain language summary

Conflict of interest statement

Figures

Comment in

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Molecular Biology Databases