Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2009 Nov;66(22):3625-39.
doi: 10.1007/s00018-009-0117-0. Epub 2009 Aug 19.

The protein meta-structure: a novel concept for chemical and molecular biology

Affiliations
Review

The protein meta-structure: a novel concept for chemical and molecular biology

Robert Konrat. Cell Mol Life Sci. 2009 Nov.

Abstract

The ultimate goal of bioinformatics or computational chemical biology is the sequence-based prediction of protein functionality. However, due to the degeneracy of the primary sequence code there is no unambiguous relationship. The degeneracy can be partly lifted by going to higher levels of abstraction and, for example, incorporating 3D structural information. However, sometimes even at this conceptual level functional ambiguities often remain. Here a novel conceptual framework is described (the protein meta-structure). At this level of abstraction, the protein structure is viewed as an intricate network of interacting residues. This novel conception offers unique possibilities for chemical (molecular) biology, structural genomics and drug discovery. In this review some prototypical applications will be presented that serve to illustrate the potential of the methodology.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
The protein meta-structure concept. The 3D structural information is transformed into topological space by calculating the network of residue interactions. In this network structure a node refers to an amino acid and edges indicate the existence of neighborhood relationships. Two residues are considered as neighbors if the Cα–Cα distances are below a distance threshold. (a) 3D structure of H-ras p21 [52]. (b) Location of the network comprising Q61–T35–V14–I84–N116–E143 of H-ras p21. (c) Topological network of H-ras p21 calculated from the 3D structure (5p21; Cα–Cα distance cutoff: 8 Å). (d) Subset of the topological network of H-ras p21 (by zooming into the network graph of Fig. 1 c) demonstrating the topological relationship (shortest path length θ) between residues. Solid lines indicate the spatial relationship between residues (Cα–Cα distance below 8 Å). The topological relationships θ between different nodes/residues in the network graph are defined as the shortest path lengths θ (dashed lines) through the network (see text). For example, Q61 and I84 are connected via a single node (V14) and thus yielding the shortest path length of θ = 2, whereas T35 and E116 are linked via three nodes (θ = 4). The figure was created using the programs pymol (http://www.pymol.org) and Visone 1.1.1 (http://www.visone.de)
Fig. 2
Fig. 2
Typical pairwise distribution functions ρ(θ, A, B, l AB) extracted from the PDB database. (a) The distribution of shortest path lengths observed for residue pairs separated by four residues in the primary sequence are shown, (grey: ρ(θ, Asp, Glu, 4); black: ρ(θ, Ile, Leu, 4). (b) Long-range (primary sequence difference l AB ≥ 5) shortest path length distributions (grey: ρ(θ, Asp, Glu, n); black: ρ(θ, Ile, Leu, n). (In b the counts are divided by 1,000). It can be seen that Ile–Leu pairs are typically clustered in protein structures (smaller θ values), whereas Asp–Glu tends to be more distant (larger θ values)
Fig. 3
Fig. 3
The protein meta-structure. Residue secondary structure and compactness plot of the PI3-kinase p85  N-terminal SH2 domain. (PDB: 2IUG). Comparison of predicted local secondary structural features (a) and compactness (b) as a function of residue position, and 3D protein structure (c). Positive secondary structure values are indicative of α-helical segments (shown in red). In contrast, continuous negative values are typical for extended or β-strand regions (shown in blue). Residues of loosely defined secondary structure are shown in black. Large compactness values indicate residue positions typically buried in the interior of the 3D structure, whereas small values are found for residues exposed to the solvent. In the 3D structure of the SH2 domain (c) residues are color coded according to the meta-structure (local secondary structure) results (α-helix: red; β-sheet: blue). The figure was prepared using the program pymol (http://www.pymol.org)
Fig. 4
Fig. 4
Intrinsically unstructured proteins (IUP) from different kingdoms. A selection of prototypical IUPs/NUPs identified in archea (left), prokaryotes (middle) and eukaryotes (right) are shown. The following examples are shown: (left) ARC: 152.5, Methanobacterium thermoautotrophicus, SwissProt: O26774, Prefoldin-β-subunit; (middle) ARC: 135.3, Listeria monocytogenes, SwissProt: Q8Y494, Probable DNA-directed RNA polymerase; (right) ARC: 177.7, Candida Glabrata, SwissProt: Q6FY96, Prefoldin-β-subunit. The IUPs/NUPs have been identified based on the average residue compactness (ARC) approach (see text). A protein is annotated as IUP/NUP if the global ARC is smaller than 200 or alternatively 30 consecutive residues display a local average compactness value smaller than 150. Predicted compactness (upper part) and local secondary structure (lower part) are shown. Positive secondary structure values are indicative of α-helical segments, whereas continuous negative values are typical for extended conformations (β-strand or polyproline II)
Fig. 5
Fig. 5
Top: Solution structure of ICln [21]. Conformationally flexible parts of the protein are indicated in red. Bottom: Residue plot showing compactness C i vs. residue position. The conformationally flexible parts (red) are correctly identified by significantly reduced compactness values. The conformational flexibility of these regions has been independently verified by NMR spin relaxation analysis [21]. The protein construct used for structure determination only comprised residue 1–165; the part adjacent to the C-terminal α-helix was missing in the construct for structure determination
Fig. 6
Fig. 6
Overlay of 15N–1H HSQC spectra for wild-type (black) [21] and truncated (elimination of residues 85–105 located in the flexible linker region, yellow) ICln, respectively. Overall cross peak positions are unchanged, thus indicating unchanged solution structures of both ICln constructs. Cross peaks in the black dataset, which are absent in the yellow dataset, correspond to residues in the flexible linker region (eliminated in the truncated version)
Fig. 7
Fig. 7
Meta-structure-based assessment of local secondary structure in natively unfolded proteins. (a) Overlay of 15N–1H HSQC spectra for full-length ICln (1–237, black) and the C-terminal domain of ICln (158–237, red). The nearly identical peak positions in the C-terminal domain of ICln, CTD-ICln, (red) indicates that CTD-ICln exists as a largely unfolded polypeptide chain in solution. Comparison between (red) meta-structure and (black) NMR derived local secondary structure elements for (b) CTD-ICln and (c) Osteopontin [27, 28]. Positive and negative values indicate the existence of local α-helical segments or β-strands. NMR results were obtained from Δ13Cα−Δ13Cβ secondary shifts (see text).
Fig. 8
Fig. 8
Protein meta-structure alignment. The pairwise protein sequence alignment is based on calculated meta-structure parameters. The scoring function for obtaining the optimal sequence match involves compactness and secondary structure values. Comparison between meta-structure alignments (left) and structural superpositions (right). Top Meta-structure (left) and 3D structure alignment (right) of TM1457 (PDB:1S12) and the DNA mismatch repair protein PMS2 (PDB:1EA6). Bottom: Meta-structure (left) and 3D structure alignment (right) of the A chain of a putative isomerase from Rhodopseudomonas palustris (Midwest Center of Structural Genomics, to be published, PDB:3DM8) and the B chain of limonene-1,2-epoxide hydrolase from Rhodococcus erythropolis (PDB:1NWW) [38]. The structure alignment was performed with the program TopMatch [35, 36]. The protein structures are shown in blue (top: 1EA6, bottom: 3DM8) and green (top: 1S12, bottom: 1NWW), and the regions of similar structure are colored red and orange. The figure was generated using Pymol (http://www.pymol.org)
Fig. 9
Fig. 9
Overview of the protein meta-structure similarity clustering (PMSSC) approach for ligand development. Top: 3D Structure-based similarity clustering approach developed by H. Waldmann and co-workers [–49]. Conservation of structural motifs in the ligand sensing region of proteins is used as a classifying principle to group proteins into clusters with similar ligand-binding properties. Structures of ligands binding to one member of the cluster are valid starting points for ligand development for other cluster members. Bottom: Meta-structure similarities provide valuable starting information for the identification of chemical scaffolds and guiding structures in ligand development programs without the requirement of high-resolution protein structures
Fig. 10
Fig. 10
Chemical structures of ligand scaffolds identified by the meta-structure approach based for Tm0936 from Thermotoga maritima. (a) Ligand scaffold identified by the 3D structure-based approach using the crystal structure of Tm0936 and computational docking and virtual screening [52], (b) known ligands for S-adenosylmethionine-synthetase and (c) ligand scaffolds identified using the meta-structure approach (exclusively based on primary sequence information). The meta-structure approach was based on a (pairwise) sequence-to-sequence screen against the targets of the DRUGBANK database [51]

References

    1. Tanford C, Reynolds J (2003) Nature’s robots. A history of proteins. Oxford University Press, Oxford
    1. Perutz MF, Muirhead H, Cox JM, Goaman LC. Three-dimensional Fourier synthesis of horse oxyhemoglobin at 2.8 Å resolution: the atomic model. Nature. 1968;219:131–139. doi: 10.1038/219131a0. - DOI - PubMed
    1. Epstein CJ, Goldberger RF, Anfinsen CB. The genetic control of tertiary protein structure. Model systems. Cold Spring Harb Symp Quant Biol. 1963;28:439–449.
    1. Mayer O, Rajkowitsch L, Lorenz C, Konrat R, Schroeder R. RNA chaperone activity and RNA-binding protein properties of the E.coli protein StpA. Nucl Acid Res. 2007;35:1257–1269. doi: 10.1093/nar/gkl1143. - DOI - PMC - PubMed
    1. Sippl MJ. Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. J Mol Biol. 1990;213:859–883. doi: 10.1016/S0022-2836(05)80269-4. - DOI - PubMed

Publication types

LinkOut - more resources