Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Oct;196(20):3643-55.
doi: 10.1128/JB.01836-14. Epub 2014 Aug 11.

Towards an informative mutant phenotype for every bacterial gene

Affiliations

Towards an informative mutant phenotype for every bacterial gene

Adam Deutschbauer et al. J Bacteriol. 2014 Oct.

Abstract

Mutant phenotypes provide strong clues to the functions of the underlying genes and could allow annotation of the millions of sequenced yet uncharacterized bacterial genes. However, it is not known how many genes have a phenotype under laboratory conditions, how many phenotypes are biologically interpretable for predicting gene function, and what experimental conditions are optimal to maximize the number of genes with a phenotype. To address these issues, we measured the mutant fitness of 1,586 genes of the ethanol-producing bacterium Zymomonas mobilis ZM4 across 492 diverse experiments and found statistically significant phenotypes for 89% of all assayed genes. Thus, in Z. mobilis, most genes have a functional consequence under laboratory conditions. We demonstrate that 41% of Z. mobilis genes have both a strong phenotype and a similar fitness pattern (cofitness) to another gene, and are therefore good candidates for functional annotation using mutant fitness. Among 502 poorly characterized Z. mobilis genes, we identified a significant cofitness relationship for 174. For 57 of these genes without a specific functional annotation, we found additional evidence to support the biological significance of these gene-gene associations, and in 33 instances, we were able to predict specific physiological or biochemical roles for the poorly characterized genes. Last, we identified a set of 79 diverse mutant fitness experiments in Z. mobilis that are nearly as biologically informative as the entire set of 492 experiments. Therefore, our work provides a blueprint for the functional annotation of diverse bacteria using mutant fitness.

PubMed Disclaimer

Figures

FIG 1
FIG 1
Identifying a phenotype for most Z. mobilis genes. (A) Heat map of clustered mutant fitness data for 1,586 genes (y axis) across 492 experiments (x axis). Reduced fitness values are shown in blue, and enhanced fitness values are shown in yellow (see color key). The experiments are binned into 79 groups (alternating colors on the x axis) to increase statistical power for detecting subtle phenotypes (see Materials and Methods). Genes are color-coded to the right of the heat map according to whether they are beneficial for fitness in any group of experiments (red), detrimental to fitness in a group of experiments and never beneficial (green), or have no statistically significant phenotype in any group of experiments (no color). (B) Scatterplot of gene fitness values in rich medium (ZRMG medium; x axis) versus rich medium supplemented with an inhibitory concentration of cisplatin (y axis). Negative values are indicative of reduced fitness relative to the typical strain in the mutant pools. Genes encoding members of the UvrABCD nucleotide excision repair system, RecA, and RecFGORX are highlighted. The solid black line shows x = y. (C) Correlation of fitness (cofitness on the y axis) for 573 pairs of adjacent genes that are predicted to be cotranscribed in an operon. The pairs are ranked by the most significant phenotype of the weaker gene in any of the 79 groups of experiments (from weakest to strongest phenotype; x axis). Cofitness values are colored according to whether both genes in the pair have a significant phenotype (red), only one gene in the pair has a significant phenotype (black), or neither gene has a significant phenotype (green). The gray hatched region covers 99% of the cofitness distribution from shuffled data (−0.117 to 0.115). The dashed blue line represents the best-fit smooth line through the data (local regression from loess). (D) Comparison of gene fitness in rich medium (ZRMG medium; y axis) and expression level in the same condition (x axis). Expression was determined using a high-resolution tiling microarray and is plotted as the log2 level relative to background (bg.) (see Materials and Methods). Genes with significantly reduced (red) or enhanced (green) phenotypes after 1 day (∼6 population doublings) of growth in ZRMG medium (P < 0.001 by Fisher test with 30 replicates) are indicated. (E) Comparison of gene fitness for 1,586 genes after 3 days (∼18 population doublings) (x axis) or 7 days (∼42 population doublings) (y axis) of batch transfer growth in rich medium (cells were diluted back in fresh medium each day). The solid black line shows x = y. The vertical gray lines represent fitness of −0.2 and 0.2. Genes with a significant phenotype after 3 days of growth in rich medium (P < 0.05, based on the transformed test statistic for this single experiment) are shown in orange.
FIG 2
FIG 2
Characteristics of Z. mobilis phenotypes. (A) Comparison of the number of genes with a significant phenotype at different absolute fitness thresholds for genes with reduced fitness phenotypes (red), enhanced fitness phenotypes (green), or any phenotype (either; black). For example, at a fitness threshold of less than −1.0 in any of the 79 experimental groups, there are 880 beneficial genes (reduced fitness). Similarly, at a fitness threshold of greater than 1.0, there are 345 detrimental genes (enhanced fitness). The gray horizontal line marks 1,586, the total number of Z. mobilis genes we have data for. (B) Histogram of the number of genes (y axis) and their frequency of significant phenotypes among the 79 groups of experiments (x axis). (C) The fraction of Z. mobilis genes (x axis) with a significant phenotype among different categories (y axis). Genes are categorized as follows: “ORFan,” no close homologs in any other bacterial genome; “nodom,” no significant InterPro domain; “incaulo,” presence of an ortholog in Caulobacter crescentus; “domain,” other genes that contain an InterPro domain. The single letters indicate the COG (clusters of orthologous groups of proteins) categories: C (energy production and conversion), D (cell cycle control, cell division, and chromosome partitioning), E (amino acid transport and metabolism), F (nucleotide transport and metabolism), G (carbohydrate transport and metabolism), H (coenzyme transport and metabolism), I (lipid transport and metabolism), J (translation, ribosomal structure, and biogenesis), K (transcription), L (replication, recombination, and repair), M (cell wall/membrane/envelope biogenesis), N (cell motility), O (posttranslational modification, protein turnover, chaperones), P (inorganic ion transport and metabolism), Q (secondary metabolite biosynthesis, transport, and catabolism), R (general function prediction only), S (function unknown), T (signal transduction mechanisms), U (intracellular trafficking, secretion, and vesicular transport), and V (defense mechanisms). The vertical blue line represents the fraction of all Z. mobilis genes with a phenotype (0.89). The error bars show the 95% confidence intervals. Categories marked in green are significantly enriched for phenotypes (Fisher exact test, false discovery rate of <0.05), while those in red are significantly less likely to have phenotypes relative to the entire genome.
FIG 3
FIG 3
Utility of mutant fitness for annotating gene function in bacteria. (A) For each Z. mobilis gene, a scatterplot of the strongest absolute phenotype (x axis, either fitness reduced or enhanced) versus the strongest cofitness to another gene (y axis). Genes shown in red are putatively essential, and those shown in green are poorly annotated and do not have a specific annotation (no function) (see main text). The horizontal gray line marks cofitness of 0.75, and the vertical gray line marks absolute fitness of 1.0. (B) Distribution of fitness correlations (cofitness) for different classes of Z. mobilis gene pairs across all 492 experiments. All pairs of genes that we have data for (All Pairs), gene pairs that have the same TIGR/JCVI subrole (38) and are not within 20 kbp of each other on the chromosome (Same Subrole, Not Nearby), and genes with maximum cofitness for each gene excluding nearby hits within 20 kbp (Top Hits, Not Nearby) are shown. The distributions were estimated from the discrete data using kernel density. The vertical gray line marks cofitness of 0.75. (C) Increase in the fraction of genes with a strong reduced-fitness phenotype (fitness less than −2 [y axis]) in any experiment as a function of the number of mutant fitness experiments performed (x axis), plotted for all Z. mobilis genes for which we have data (n = 1,586), poorly annotated Z. mobilis genes (n = 502 [see text for criteria]), or all S. oneidensis MR-1 genes with fitness data (n = 3,355). Experiments are in random order. The red control (dashed) line is derived from the number of fitness values less than −2 among 17 control experiments (independent samples of start) for S. oneidensis MR-1. To calculate the number of Z. mobilis ZM4 genes expected to have fitness less than −2 by chance, we used the observed standard deviation in 17 control experiments (independent samples of start; this standard deviation was 0.40) and the theoretical probability of a normal distribution with this standard deviation and a mean of 0 giving a value below −2 (2.8 × 10−7 per gene per experiment).
FIG 4
FIG 4
Function of Rnf/RseC in Z. mobilis. (A) Heat map of gene fitness values in rich medium in experiments for mutants in components of the Rnf complex and RseC. The experiments marked in red (x axis) were performed under aerobic conditions, and those marked in orange were performed under anaerobic conditions. Fitness values are color-coded as described in the legend to Fig. 1A. (B) Comparison of gene fitness values for the Rnf complex (averaged across all eight genes encoding components of the complex) versus RseC in different categories of experiments.

References

    1. Galperin MY, Koonin EV. 2010. From complete genome sequence to ‘complete' understanding? Trends Biotechnol. 28:398–406. 10.1016/j.tibtech.2010.05.006 - DOI - PMC - PubMed
    1. Raskin DM, Seshadri R, Pukatzki SU, Mekalanos JJ. 2006. Bacterial genomics and pathogen evolution. Cell 124:703–714. 10.1016/j.cell.2006.02.002 - DOI - PubMed
    1. Roberts RJ, Chang YC, Hu Z, Rachlin JN, Anton BP, Pokrzywa RM, Choi HP, Faller LL, Guleria J, Housman G, Klitgord N, Mazumdar V, McGettrick MG, Osmani L, Swaminathan R, Tao KR, Letovsky S, Vitkup D, Segre D, Salzberg SL, Delisi C, Steffen M, Kasif S. 2011. COMBREX: a project to accelerate the functional annotation of prokaryotic genomes. Nucleic Acids Res. 39:D11–D14. 10.1093/nar/gkq1168 - DOI - PMC - PubMed
    1. Deutschbauer A, Price MN, Wetmore KM, Shao W, Baumohl JK, Xu Z, Nguyen M, Tamse R, Davis RW, Arkin AP. 2011. Evidence-based annotation of gene function in Shewanella oneidensis MR-1 using genome-wide fitness profiling across 121 conditions. PLoS Genet. 7:e1002385. 10.1371/journal.pgen.1002385 - DOI - PMC - PubMed
    1. Dudley AM, Janse DM, Tanay A, Shamir R, Church GM. 2005. A global view of pleiotropy and phenotypically derived gene function in yeast. Mol. Syst. Biol. 1:2005.0001. 10.1038/msb4100004 - DOI - PMC - PubMed

Publication types

Substances

Associated data