The Modular Organization of Protein Interactions in Escherichia coli

José M Peregrín-Alvarez¹, Xuejian Xiong, Chong Su, John Parkinson

Affiliations

PMID: 19798435
PMCID: PMC2739439
DOI: 10.1371/journal.pcbi.1000523

The Modular Organization of Protein Interactions in Escherichia coli

José M Peregrín-Alvarez et al. PLoS Comput Biol. 2009 Oct.

. 2009 Oct;5(10):e1000523.

doi: 10.1371/journal.pcbi.1000523. Epub 2009 Oct 2.

Authors

José M Peregrín-Alvarez¹, Xuejian Xiong, Chong Su, John Parkinson

Affiliation

¹ Program in Molecular Structure and Function, Hospital for Sick Children, Toronto, Ontario, Canada. peregrin@ebi.ac.uk

PMID: 19798435
PMCID: PMC2739439
DOI: 10.1371/journal.pcbi.1000523

Abstract

Escherichia coli serves as an excellent model for the study of fundamental cellular processes such as metabolism, signalling and gene expression. Understanding the function and organization of proteins within these processes is an important step towards a 'systems' view of E. coli. Integrating experimental and computational interaction data, we present a reliable network of 3,989 functional interactions between 1,941 E. coli proteins ( approximately 45% of its proteome). These were combined with a recently generated set of 3,888 high-quality physical interactions between 918 proteins and clustered to reveal 316 discrete modules. In addition to known protein complexes (e.g., RNA and DNA polymerases), we identified modules that represent biochemical pathways (e.g., nitrate regulation and cell wall biosynthesis) as well as batteries of functionally and evolutionarily related processes. To aid the interpretation of modular relationships, several case examples are presented, including both well characterized and novel biochemical systems. Together these data provide a global view of the modular organization of the E. coli proteome and yield unique insights into structural and evolutionary relationships in bacterial networks.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Figure 1. Generation of the global *E. coli* functional network.**
A: Schematic overview of the generation of the *E. coli* functional network, its integration with the *Hu et al. TAP* dataset and prediction of functional modules. The number of interactions associated with each of the nine datasets are provided. B: Datasets and network accuracy. The cumulative log likelihood score (LLS) was obtained from comparison with EcoCyc functional categories and provides a measure of accuracy associated with functional linkages. The relative contribution of each of the nine datasets to the LLS for each linkage is indicated. For the derivation of the definitive ‘functional’ network, we applied a threshold based on the LLS obtained for the small scale assays. Note this threshold exceeds the maximum contribution from any other single dataset. C: To assess the performance of the functional interactions we measured the precision, recall and area under the receiver operating characteristic curve (AUROC) across each COG functional category, for both the functional network and for a set of 100 randomly generated networks that possessed the same topology as the functional network (see Text S1 - *Generation of random networks*). Colours are consistent with the colours provided by the COG website (http://www.ncbi.nlm.nih.gov/COG/grace/uni.html). D: Network statistics for five *E. coli* networks, three comparable networks for *S. cerevisiae* and a randomly generated network with equivalent properties to the combined network (number of nodes and interactions but not topology). ‘*E. coli* combined’, ‘functional’ and ‘Hu *et al.*, TAP’ networks are described in the main text. ‘*E. coil* extended’ is the initial set of 58,844 interactions obtained prior to applying a threshold cut-off. ‘*S. cerevisiae* functional’ and ‘experimental’ datasets were derived from and respectively. The ‘*E. coli* Hu et al. TAP (adjusted)’ and ‘*S. cerevisiae* experimental (adjusted)’ datasets were generated by randomly removing connections until their average node degrees were similar to the equivalent functional networks. E: Overlap of the functional network with three experimentally derived networks and a set of random networks. ‘*Hu et al. TAP*’ refers to the complete *Hu et al. TAP* dataset. ‘Filtered’ refers to the *Hu et al. TAP* network in which we removed interactions that also featured in the large scale TAP dataset and were included in the functional network. ‘Pull Down’ refers to the large scale pull down dataset , removing direct interactions that were included in the functional dataset. ‘Random: same topology’ refers to the average values of 100 random networks created with the same number of nodes and interactions as the “Filtered dataset” (see Text S1 - *Generation of random networks*). ‘Direct’ indicates that the interaction is preserved between the two networks. Numbers indicate the distance of proteins in the functional network compared with those that directly interact in each of the other three networks. Error bars are negligible and are not shown for clarity. F: Interactions between COG functional categories. Numbers indicate the total number of interactions between each pair of COG functional categories. Colours represent Z-score deviations from the expected number of interactions. For further information see Text S1 - *Network analyses in the context of COG functional categories*.

**Figure 2. Organization of the combined *E. coli* protein-protein interaction network into functional modules.**
A: Graphical overview of 316 interconnected functional modules. Each pie chart represents an individual functional module, its relative size indicating the number of proteins in the module (only modules with 3 or more proteins are shown). The colours of each slice indicate the proportion of proteins found in functional modules predicted by either the functional, *Hu et al. TAP* or combined networks. Module borders are coloured if >60% of their members are associated with a single COG category (white otherwise). Edges represent *Hu et al. TAP* and/or functional interactions linking pairs of modules. Edge colours indicate the relative contribution of each network in the interaction. Edge thickness indicates the number of interactions between each module pair. B: Functional overlap of modules generated for the three networks presented in this study together with a previously published set of modules generated from a functional network (Hu et al. GC [27]) and 100 sets of modules generated by randomly swapping component genes between the modules generated from the combined network. Module overlap was determined through common membership of COG functional categories of their constituent proteins. Novel modules are defined as those in which component proteins are either not assigned a COG category or assigned the generic COG categories, S (‘Function unknown’) or R (‘General function prediction only’).

**Figure 3. Examples of functional modules I.**
A: Chemotaxis and flagella assembly. (i) Within the combined network, components of chemotaxis and flagella assembly are organized within two distinct modules (3 and 15). Nodes are coloured according to their organization as defined by KEGG (see below); width of edges linking nodes indicate confidence associated with interactions. (ii) Map representing KEGG defined relationships associated with the chemotaxis pathway. (iii) Schematic of the structural organization of components of the flagella as defined by KEGG. B: Leucine, isoleucine and valine biosynthesis. (i) KEGG-based representation of Leucine, isoleucine and valine biosynthesis. (ii) Organization of components of the pathway within the combined network. Colours of nodes reflect KEGG pathway organization; width of edges linking nodes indicate confidence associated with interactions. C: Pili Assembly. (i) Two components of pili assembly are the outer membrane usher proteins and the pili chaperones. Within the combined network, family members of these proteins are organized into two modules on the basis of common patterns of interactions (21 and 35). Note that no member of either module interacts with a component from the same module. (ii) Linear representation of the operon organization of pili assembly proteins within the *E. coli*. Colours of nodes and genes in operons reflect functional roles (see inset).

**Figure 4. Examples of functional modules II.**
A: ABC transporters. Within the combined network, a number of modules were identified as containing components of ABC transporters, presented are a select 12, organized into substrate binding, permease and ATP-binding components as defined by KEGG. Colours of nodes indicate module membership (white nodes represent components that were not associated with one of the 316 modules). Colours of links represent type of supporting evidence (GP = genome proximity; RS = Rosetta stone; PP = phylogenetic profiles; LT = literature curation). B: Cell Wall Biosynthesis and Cell Division. (i) Subnetwork of 10 defined modules containing proteins associated with cell all biosynthesis and cell division. Nodes are coloured according to module membership. The larger background ovals indicate groups of proteins with common functional roles. (ii–v) Schematics illustrating the organization and operation of components during cell division. FtsZ is recruited to the site of cell division under the control of the minCDE, and subsequently recruits ftsA and zipA (ii). FtsK mediates the localization of components of TopoIV (parCE) required for chromosome partitioning, and is dependant on ftsA and zipA (iii). Further recruitment of additional cell division proteins – ftsBILNQW - (iv) is followed by the localization of cell wall biosynthetic machinery which includes members of the peptidoglycan biosynthesis pathway – murCDEFGY (v). Inclusion of secA interactions may be related to the fact that both secA and ftsZ both bind tightly to the inner membrane in the presence of MgCl2 .

**Figure 5. Organization of laterally transferred interactions.**
A: Organization of LGT-derived proteins within the combined network. Each pie chart indicates a single protein, with the coloured arcs reflecting its phylogenetic profile (see inset key). The colour at the centre of each pie chart indicates module membership. Large coloured ovals grouping proteins define gene neighbourhoods (each gene is within 2000 bp of at least one other gene). Colours of links represent type of supporting evidence (GP = genome proximity; RS = Rosetta stone; PP = phylogenetic profiles; LT = literature curation; PD = pull down). The embedded colour key indicates the breakdown of taxonomic groups used to construct the phylogenetic profiles – numbers indicate the number of genomes associated with each group. B: Organization of LGT genes with the *E. coli* genome. The outer circle indicates the location of LGT genes. Grey lines indicate LGT genes not identified within our network. Coloured lines extending into the center indicate LGT genes identified within our network, organized into gene neighbourhoods. Coloured circles indicate the relationship between the gene neighbourhoods and their organization within the network shown in A. C: Network organization of proteins involved in hydrogenase biosynthesis. Two proteins associated with hydrogenase 3, hycE and hycG are thought to derive through LGT and are highlighted. Also present in the combined network are proteins associated with: hydrogenase 1 (hyaABDEF); hydrogenase 2 (hybCDFO); hydrogenase 4 (hyfBDFGI); hydrogenase maturation (hypBCDEF and slyD); and NADH∶ubiquinone biosynthesis (nuoBCEFGHILMN). D: Network organization of proteins involved in enterobactin synthesis and related processes. Again proteins thought to derive through LGT and are highlighted. Also shown are components of the tryptophan biosynthetic pathway responsible for production of the chorismate precursor of enterobactin (trpABCDE); and components of two other related iron transport systems – fhuABCDE and fecABCD, which uptake iron via hydroxamate and dicitrate respectively.

**Figure 6. Amended model of the evolution of the *E. coli* interaction network.**
From an ancestral network, new interactions are acquired either through the duplication of existing genes (blue nodes) or the acquisition of novel genes through lateral gene transfer events (LGT – red nodes). The preferential attachment model suggests that duplicated genes are more likely to be located at the core of the network (genes associated with large gene families are more highly connected and more central to the network). On the other hand we find that LGT derived proteins tend to be more peripheral and/or integrated as a discrete module perhaps because they are less liable to disrupt essential functions associated with the network core.

See this image and copyright information in PMC

References

1. Feist AM, Henry CS, Reed JL, Krummenacker M, Joyce AR, et al. A genome-scale metabolic reconstruction for Escherichia coli K-12 MG1655 that accounts for 1260 ORFs and thermodynamic information. Mol Syst Biol. 2007;3:121. - PMC - PubMed
1. Mori H. From the sequence to cell modeling: comprehensive functional genomics in Escherichia coli. J Biochem Mol Biol. 2004;37:83–92. - PubMed
1. Kaper JB, Nataro JP, Mobley HLT. Pathogenic Escherichia coli. Nat Rev Micro. 2004;2:123–140. - PubMed
1. Blattner FR, Plunkett G, 3rd, Bloch CA, Perna NT, Burland V, et al. The complete genome sequence of Escherichia coli K-12. Science. 1997;277:1453–1474. - PubMed
1. Keseler IM, Collado-Vides J, Gama-Castro S, Ingraham J, Paley S, et al. EcoCyc: a comprehensive database resource for Escherichia coli. Nucleic Acids Res. 2005;33:D334–337. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The Modular Organization of Protein Interactions in Escherichia coli

Affiliation

The Modular Organization of Protein Interactions in Escherichia coli

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources