Review

. 2009 Nov;66(22):3625-39.

doi: 10.1007/s00018-009-0117-0. Epub 2009 Aug 19.

The protein meta-structure: a novel concept for chemical and molecular biology

Robert Konrat¹

Affiliations

Affiliation

¹ Department of Structural and Computational Biology, Max F. Perutz Laboratories, University of Vienna, Vienna Biocenter Campus 5, 1030, Vienna, Austria. robert.konrat@univie.ac.at

PMID: 19690801
PMCID: PMC11115628
DOI: 10.1007/s00018-009-0117-0

Review

The protein meta-structure: a novel concept for chemical and molecular biology

Robert Konrat. Cell Mol Life Sci. 2009 Nov.

. 2009 Nov;66(22):3625-39.

doi: 10.1007/s00018-009-0117-0. Epub 2009 Aug 19.

Author

Robert Konrat¹

Affiliation

¹ Department of Structural and Computational Biology, Max F. Perutz Laboratories, University of Vienna, Vienna Biocenter Campus 5, 1030, Vienna, Austria. robert.konrat@univie.ac.at

PMID: 19690801
PMCID: PMC11115628
DOI: 10.1007/s00018-009-0117-0

Abstract

The ultimate goal of bioinformatics or computational chemical biology is the sequence-based prediction of protein functionality. However, due to the degeneracy of the primary sequence code there is no unambiguous relationship. The degeneracy can be partly lifted by going to higher levels of abstraction and, for example, incorporating 3D structural information. However, sometimes even at this conceptual level functional ambiguities often remain. Here a novel conceptual framework is described (the protein meta-structure). At this level of abstraction, the protein structure is viewed as an intricate network of interacting residues. This novel conception offers unique possibilities for chemical (molecular) biology, structural genomics and drug discovery. In this review some prototypical applications will be presented that serve to illustrate the potential of the methodology.

PubMed Disclaimer

Figures

**Fig. 1**
The protein meta-structure concept. The 3D structural information is transformed into topological space by calculating the network of residue interactions. In this network structure a node refers to an amino acid and edges indicate the existence of neighborhood relationships. Two residues are considered as neighbors if the Cα–Cα distances are below a distance threshold. (a) 3D structure of H-ras p21 [52]. (b) Location of the network comprising Q61–T35–V14–I84–N116–E143 of H-ras p21. (c) Topological network of H-ras p21 calculated from the 3D structure (5p21; Cα–Cα distance cutoff: 8 Å). (d) Subset of the topological network of H-ras p21 (by zooming into the network graph of Fig. 1 c) demonstrating the topological relationship (shortest path length θ) between residues. Solid lines indicate the spatial relationship between residues (Cα–Cα distance below 8 Å). The topological relationships θ between different nodes/residues in the network graph are defined as the shortest path lengths θ (*dashed lines*) through the network (see text). For example, Q61 and I84 are connected via a single node (V14) and thus yielding the shortest path length of θ = 2, whereas T35 and E116 are linked via three nodes (θ = 4). The figure was created using the programs pymol (http://www.pymol.org) and Visone 1.1.1 (http://www.visone.de)

**Fig. 2**
Typical pairwise distribution functions ρ(θ, A, B, l _AB) extracted from the PDB database. (a) The distribution of shortest path lengths observed for residue pairs separated by four residues in the primary sequence are shown, (*grey*: ρ(θ, Asp, Glu, 4); *black*: ρ(θ, Ile, Leu, 4). (b) Long-range (primary sequence difference l _AB ≥ 5) shortest path length distributions (*grey*: ρ(θ, Asp, Glu, n); *black*: ρ(θ, Ile, Leu, n). (In b the counts are divided by 1,000). It can be seen that Ile–Leu pairs are typically clustered in protein structures (smaller θ values), whereas Asp–Glu tends to be more distant (larger θ values)

**Fig. 3**
The protein meta-structure. Residue secondary structure and compactness plot of the PI3-kinase p85 N-terminal SH2 domain. (PDB: 2IUG). Comparison of predicted local secondary structural features (a) and compactness (b) as a function of residue position, and 3D protein structure (c). Positive secondary structure values are indicative of α-helical segments (shown in *red*). In contrast, continuous negative values are typical for extended or β-strand regions (shown in *blue*). Residues of loosely defined secondary structure are shown in black. Large compactness values indicate residue positions typically buried in the interior of the 3D structure, whereas small values are found for residues exposed to the solvent. In the 3D structure of the SH2 domain (c) residues are color coded according to the meta-structure (local secondary structure) results (α-helix: *red*; β-sheet: *blue*). The figure was prepared using the program pymol (http://www.pymol.org)

**Fig. 4**
Intrinsically unstructured proteins (IUP) from different kingdoms. A selection of prototypical IUPs/NUPs identified in archea (*left*), prokaryotes (*middle*) and eukaryotes (*right*) are shown. The following examples are shown: (*left*) ARC: 152.5, *Methanobacterium thermoautotrophicus*, SwissProt: O26774, Prefoldin-β-subunit; (*middle*) ARC: 135.3, *Listeria monocytogenes*, SwissProt: Q8Y494, Probable DNA-directed RNA polymerase; (*right*) ARC: 177.7, *Candida Glabrata*, SwissProt: Q6FY96, Prefoldin-β-subunit. The IUPs/NUPs have been identified based on the average residue compactness (ARC) approach (see text). A protein is annotated as IUP/NUP if the global ARC is smaller than 200 or alternatively 30 consecutive residues display a local average compactness value smaller than 150. Predicted compactness (*upper part*) and local secondary structure (*lower part*) are shown. Positive secondary structure values are indicative of α-helical segments, whereas continuous negative values are typical for extended conformations (β-strand or polyproline II)

**Fig. 5**
*Top*: Solution structure of ICln [21]. Conformationally flexible parts of the protein are indicated in *red*. *Bottom*: Residue plot showing compactness C _i vs. residue position. The conformationally flexible parts (*red*) are correctly identified by significantly reduced compactness values. The conformational flexibility of these regions has been independently verified by NMR spin relaxation analysis [21]. The protein construct used for structure determination only comprised residue 1–165; the part adjacent to the C-terminal α-helix was missing in the construct for structure determination

**Fig. 6**
Overlay of ¹⁵N–¹H HSQC spectra for wild-type (*black*) [21] and truncated (elimination of residues 85–105 located in the flexible linker region, *yellow*) ICln, respectively. Overall cross peak positions are unchanged, thus indicating unchanged solution structures of both ICln constructs. Cross peaks in the *black* dataset, which are absent in the *yellow* dataset, correspond to residues in the flexible linker region (eliminated in the truncated version)

**Fig. 7**
Meta-structure-based assessment of local secondary structure in natively unfolded proteins. (a) Overlay of ¹⁵N–¹H HSQC spectra for full-length ICln (1–237, *black*) and the C-terminal domain of ICln (158–237, *red*). The nearly identical peak positions in the C-terminal domain of ICln, CTD-ICln, (*red*) indicates that CTD-ICln exists as a largely unfolded polypeptide chain in solution. Comparison between (*red*) meta-structure and (*black*) NMR derived local secondary structure elements for (b) CTD-ICln and (c) Osteopontin [27, 28]. Positive and negative values indicate the existence of local α-helical segments or β-strands. NMR results were obtained from Δ¹³Cα−Δ¹³Cβ secondary shifts (see text).

**Fig. 8**
Protein meta-structure alignment. The pairwise protein sequence alignment is based on calculated meta-structure parameters. The scoring function for obtaining the optimal sequence match involves compactness and secondary structure values. Comparison between meta-structure alignments (*left*) and structural superpositions (*right*). *Top* Meta-structure (*left*) and 3D structure alignment (*right*) of TM1457 (PDB:1S12) and the DNA mismatch repair protein PMS2 (PDB:1EA6). *Bottom*: Meta-structure (*left*) and 3D structure alignment (*right*) of the A chain of a putative isomerase from *Rhodopseudomonas palustris* (Midwest Center of Structural Genomics, to be published, PDB:3DM8) and the B chain of limonene-1,2-epoxide hydrolase from *Rhodococcus erythropolis* (PDB:1NWW) [38]. The structure alignment was performed with the program TopMatch [35, 36]. The protein structures are shown in *blue* (*top*: 1EA6, *bottom*: 3DM8) and *green* (*top*: 1S12, bottom: 1NWW), and the regions of similar structure are colored *red* and *orange*. The figure was generated using Pymol (http://www.pymol.org)

**Fig. 9**
Overview of the protein meta-structure similarity clustering (PMSSC) approach for ligand development. *Top*: 3D Structure-based similarity clustering approach developed by H. Waldmann and co-workers [–49]. Conservation of structural motifs in the ligand sensing region of proteins is used as a classifying principle to group proteins into clusters with similar ligand-binding properties. Structures of ligands binding to one member of the cluster are valid starting points for ligand development for other cluster members. *Bottom*: Meta-structure similarities provide valuable starting information for the identification of chemical scaffolds and guiding structures in ligand development programs without the requirement of high-resolution protein structures

**Fig. 10**
Chemical structures of ligand scaffolds identified by the meta-structure approach based for Tm0936 from *Thermotoga maritima*. (a) Ligand scaffold identified by the 3D structure-based approach using the crystal structure of Tm0936 and computational docking and virtual screening [52], (b) known ligands for S-adenosylmethionine-synthetase and (c) ligand scaffolds identified using the meta-structure approach (exclusively based on primary sequence information). The meta-structure approach was based on a (pairwise) sequence-to-sequence screen against the targets of the DRUGBANK database [51]

See this image and copyright information in PMC

References

1. Tanford C, Reynolds J (2003) Nature’s robots. A history of proteins. Oxford University Press, Oxford
1. Perutz MF, Muirhead H, Cox JM, Goaman LC. Three-dimensional Fourier synthesis of horse oxyhemoglobin at 2.8 Å resolution: the atomic model. Nature. 1968;219:131–139. doi: 10.1038/219131a0. - DOI - PubMed
1. Epstein CJ, Goldberger RF, Anfinsen CB. The genetic control of tertiary protein structure. Model systems. Cold Spring Harb Symp Quant Biol. 1963;28:439–449.
1. Mayer O, Rajkowitsch L, Lorenz C, Konrat R, Schroeder R. RNA chaperone activity and RNA-binding protein properties of the E.coli protein StpA. Nucl Acid Res. 2007;35:1257–1269. doi: 10.1093/nar/gkl1143. - DOI - PMC - PubMed
1. Sippl MJ. Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. J Mol Biol. 1990;213:859–883. doi: 10.1016/S0022-2836(05)80269-4. - DOI - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The protein meta-structure: a novel concept for chemical and molecular biology

Affiliation

The protein meta-structure: a novel concept for chemical and molecular biology

Author

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources