Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Oct;5(10):e1000523.
doi: 10.1371/journal.pcbi.1000523. Epub 2009 Oct 2.

The Modular Organization of Protein Interactions in Escherichia coli

Affiliations

The Modular Organization of Protein Interactions in Escherichia coli

José M Peregrín-Alvarez et al. PLoS Comput Biol. 2009 Oct.

Abstract

Escherichia coli serves as an excellent model for the study of fundamental cellular processes such as metabolism, signalling and gene expression. Understanding the function and organization of proteins within these processes is an important step towards a 'systems' view of E. coli. Integrating experimental and computational interaction data, we present a reliable network of 3,989 functional interactions between 1,941 E. coli proteins ( approximately 45% of its proteome). These were combined with a recently generated set of 3,888 high-quality physical interactions between 918 proteins and clustered to reveal 316 discrete modules. In addition to known protein complexes (e.g., RNA and DNA polymerases), we identified modules that represent biochemical pathways (e.g., nitrate regulation and cell wall biosynthesis) as well as batteries of functionally and evolutionarily related processes. To aid the interpretation of modular relationships, several case examples are presented, including both well characterized and novel biochemical systems. Together these data provide a global view of the modular organization of the E. coli proteome and yield unique insights into structural and evolutionary relationships in bacterial networks.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Generation of the global E. coli functional network.
A: Schematic overview of the generation of the E. coli functional network, its integration with the Hu et al. TAP dataset and prediction of functional modules. The number of interactions associated with each of the nine datasets are provided. B: Datasets and network accuracy. The cumulative log likelihood score (LLS) was obtained from comparison with EcoCyc functional categories and provides a measure of accuracy associated with functional linkages. The relative contribution of each of the nine datasets to the LLS for each linkage is indicated. For the derivation of the definitive ‘functional’ network, we applied a threshold based on the LLS obtained for the small scale assays. Note this threshold exceeds the maximum contribution from any other single dataset. C: To assess the performance of the functional interactions we measured the precision, recall and area under the receiver operating characteristic curve (AUROC) across each COG functional category, for both the functional network and for a set of 100 randomly generated networks that possessed the same topology as the functional network (see Text S1 - Generation of random networks). Colours are consistent with the colours provided by the COG website (http://www.ncbi.nlm.nih.gov/COG/grace/uni.html). D: Network statistics for five E. coli networks, three comparable networks for S. cerevisiae and a randomly generated network with equivalent properties to the combined network (number of nodes and interactions but not topology). ‘E. coli combined’, ‘functional’ and ‘Hu et al., TAP’ networks are described in the main text. ‘E. coil extended’ is the initial set of 58,844 interactions obtained prior to applying a threshold cut-off. ‘S. cerevisiae functional’ and ‘experimental’ datasets were derived from and respectively. The ‘E. coli Hu et al. TAP (adjusted)’ and ‘S. cerevisiae experimental (adjusted)’ datasets were generated by randomly removing connections until their average node degrees were similar to the equivalent functional networks. E: Overlap of the functional network with three experimentally derived networks and a set of random networks. ‘Hu et al. TAP’ refers to the complete Hu et al. TAP dataset. ‘Filtered’ refers to the Hu et al. TAP network in which we removed interactions that also featured in the large scale TAP dataset and were included in the functional network. ‘Pull Down’ refers to the large scale pull down dataset , removing direct interactions that were included in the functional dataset. ‘Random: same topology’ refers to the average values of 100 random networks created with the same number of nodes and interactions as the “Filtered dataset” (see Text S1 - Generation of random networks). ‘Direct’ indicates that the interaction is preserved between the two networks. Numbers indicate the distance of proteins in the functional network compared with those that directly interact in each of the other three networks. Error bars are negligible and are not shown for clarity. F: Interactions between COG functional categories. Numbers indicate the total number of interactions between each pair of COG functional categories. Colours represent Z-score deviations from the expected number of interactions. For further information see Text S1 - Network analyses in the context of COG functional categories.
Figure 2
Figure 2. Organization of the combined E. coli protein-protein interaction network into functional modules.
A: Graphical overview of 316 interconnected functional modules. Each pie chart represents an individual functional module, its relative size indicating the number of proteins in the module (only modules with 3 or more proteins are shown). The colours of each slice indicate the proportion of proteins found in functional modules predicted by either the functional, Hu et al. TAP or combined networks. Module borders are coloured if >60% of their members are associated with a single COG category (white otherwise). Edges represent Hu et al. TAP and/or functional interactions linking pairs of modules. Edge colours indicate the relative contribution of each network in the interaction. Edge thickness indicates the number of interactions between each module pair. B: Functional overlap of modules generated for the three networks presented in this study together with a previously published set of modules generated from a functional network (Hu et al. GC [27]) and 100 sets of modules generated by randomly swapping component genes between the modules generated from the combined network. Module overlap was determined through common membership of COG functional categories of their constituent proteins. Novel modules are defined as those in which component proteins are either not assigned a COG category or assigned the generic COG categories, S (‘Function unknown’) or R (‘General function prediction only’).
Figure 3
Figure 3. Examples of functional modules I.
A: Chemotaxis and flagella assembly. (i) Within the combined network, components of chemotaxis and flagella assembly are organized within two distinct modules (3 and 15). Nodes are coloured according to their organization as defined by KEGG (see below); width of edges linking nodes indicate confidence associated with interactions. (ii) Map representing KEGG defined relationships associated with the chemotaxis pathway. (iii) Schematic of the structural organization of components of the flagella as defined by KEGG. B: Leucine, isoleucine and valine biosynthesis. (i) KEGG-based representation of Leucine, isoleucine and valine biosynthesis. (ii) Organization of components of the pathway within the combined network. Colours of nodes reflect KEGG pathway organization; width of edges linking nodes indicate confidence associated with interactions. C: Pili Assembly. (i) Two components of pili assembly are the outer membrane usher proteins and the pili chaperones. Within the combined network, family members of these proteins are organized into two modules on the basis of common patterns of interactions (21 and 35). Note that no member of either module interacts with a component from the same module. (ii) Linear representation of the operon organization of pili assembly proteins within the E. coli. Colours of nodes and genes in operons reflect functional roles (see inset).
Figure 4
Figure 4. Examples of functional modules II.
A: ABC transporters. Within the combined network, a number of modules were identified as containing components of ABC transporters, presented are a select 12, organized into substrate binding, permease and ATP-binding components as defined by KEGG. Colours of nodes indicate module membership (white nodes represent components that were not associated with one of the 316 modules). Colours of links represent type of supporting evidence (GP = genome proximity; RS = Rosetta stone; PP = phylogenetic profiles; LT = literature curation). B: Cell Wall Biosynthesis and Cell Division. (i) Subnetwork of 10 defined modules containing proteins associated with cell all biosynthesis and cell division. Nodes are coloured according to module membership. The larger background ovals indicate groups of proteins with common functional roles. (ii–v) Schematics illustrating the organization and operation of components during cell division. FtsZ is recruited to the site of cell division under the control of the minCDE, and subsequently recruits ftsA and zipA (ii). FtsK mediates the localization of components of TopoIV (parCE) required for chromosome partitioning, and is dependant on ftsA and zipA (iii). Further recruitment of additional cell division proteins – ftsBILNQW - (iv) is followed by the localization of cell wall biosynthetic machinery which includes members of the peptidoglycan biosynthesis pathway – murCDEFGY (v). Inclusion of secA interactions may be related to the fact that both secA and ftsZ both bind tightly to the inner membrane in the presence of MgCl2 .
Figure 5
Figure 5. Organization of laterally transferred interactions.
A: Organization of LGT-derived proteins within the combined network. Each pie chart indicates a single protein, with the coloured arcs reflecting its phylogenetic profile (see inset key). The colour at the centre of each pie chart indicates module membership. Large coloured ovals grouping proteins define gene neighbourhoods (each gene is within 2000 bp of at least one other gene). Colours of links represent type of supporting evidence (GP = genome proximity; RS = Rosetta stone; PP = phylogenetic profiles; LT = literature curation; PD = pull down). The embedded colour key indicates the breakdown of taxonomic groups used to construct the phylogenetic profiles – numbers indicate the number of genomes associated with each group. B: Organization of LGT genes with the E. coli genome. The outer circle indicates the location of LGT genes. Grey lines indicate LGT genes not identified within our network. Coloured lines extending into the center indicate LGT genes identified within our network, organized into gene neighbourhoods. Coloured circles indicate the relationship between the gene neighbourhoods and their organization within the network shown in A. C: Network organization of proteins involved in hydrogenase biosynthesis. Two proteins associated with hydrogenase 3, hycE and hycG are thought to derive through LGT and are highlighted. Also present in the combined network are proteins associated with: hydrogenase 1 (hyaABDEF); hydrogenase 2 (hybCDFO); hydrogenase 4 (hyfBDFGI); hydrogenase maturation (hypBCDEF and slyD); and NADH∶ubiquinone biosynthesis (nuoBCEFGHILMN). D: Network organization of proteins involved in enterobactin synthesis and related processes. Again proteins thought to derive through LGT and are highlighted. Also shown are components of the tryptophan biosynthetic pathway responsible for production of the chorismate precursor of enterobactin (trpABCDE); and components of two other related iron transport systems – fhuABCDE and fecABCD, which uptake iron via hydroxamate and dicitrate respectively.
Figure 6
Figure 6. Amended model of the evolution of the E. coli interaction network.
From an ancestral network, new interactions are acquired either through the duplication of existing genes (blue nodes) or the acquisition of novel genes through lateral gene transfer events (LGT – red nodes). The preferential attachment model suggests that duplicated genes are more likely to be located at the core of the network (genes associated with large gene families are more highly connected and more central to the network). On the other hand we find that LGT derived proteins tend to be more peripheral and/or integrated as a discrete module perhaps because they are less liable to disrupt essential functions associated with the network core.

References

    1. Feist AM, Henry CS, Reed JL, Krummenacker M, Joyce AR, et al. A genome-scale metabolic reconstruction for Escherichia coli K-12 MG1655 that accounts for 1260 ORFs and thermodynamic information. Mol Syst Biol. 2007;3:121. - PMC - PubMed
    1. Mori H. From the sequence to cell modeling: comprehensive functional genomics in Escherichia coli. J Biochem Mol Biol. 2004;37:83–92. - PubMed
    1. Kaper JB, Nataro JP, Mobley HLT. Pathogenic Escherichia coli. Nat Rev Micro. 2004;2:123–140. - PubMed
    1. Blattner FR, Plunkett G, 3rd, Bloch CA, Perna NT, Burland V, et al. The complete genome sequence of Escherichia coli K-12. Science. 1997;277:1453–1474. - PubMed
    1. Keseler IM, Collado-Vides J, Gama-Castro S, Ingraham J, Paley S, et al. EcoCyc: a comprehensive database resource for Escherichia coli. Nucleic Acids Res. 2005;33:D334–337. - PMC - PubMed

Publication types

MeSH terms

Substances