Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Feb 23;5(2):e9391.
doi: 10.1371/journal.pone.0009391.

Atomic interaction networks in the core of protein domains and their native folds

Affiliations

Atomic interaction networks in the core of protein domains and their native folds

Venkataramanan Soundararajan et al. PLoS One. .

Abstract

Vastly divergent sequences populate a majority of protein folds. In the quest to identify features that are conserved within protein domains belonging to the same fold, we set out to examine the entire protein universe on a fold-by-fold basis. We report that the atomic interaction network in the solvent-unexposed core of protein domains are fold-conserved, extraordinary sequence divergence notwithstanding. Further, we find that this feature, termed protein core atomic interaction network (or PCAIN) is significantly distinguishable across different folds, thus appearing to be "signature" of a domain's native fold. As part of this study, we computed the PCAINs for 8698 representative protein domains from families across the 1018 known protein folds to construct our seed database and an automated framework was developed for PCAIN-based characterization of the protein fold universe. A test set of randomly selected domains that are not in the seed database was classified with over 97% accuracy, independent of sequence divergence. As an application of this novel fold signature, a PCAIN-based scoring scheme was developed for comparative (homology-based) structure prediction, with 1-2 angstroms (mean 1.61A) C(alpha) RMSD generally observed between computed structures and reference crystal structures. Our results are consistent across the full spectrum of test domains including those from recent CASP experiments and most notably in the 'twilight' and 'midnight' zones wherein <30% and <10% target-template sequence identity prevails (mean twilight RMSD of 1.69A). We further demonstrate the utility of the PCAIN protocol to derive biological insight into protein structure-function relationships, by modeling the structure of the YopM effector novel E3 ligase (NEL) domain from plague-causative bacterium Yersinia Pestis and discussing its implications for host adaptive and innate immune modulation by the pathogen. Considering the several high-throughput, sequence-identity-independent applications demonstrated in this work, we suggest that the PCAIN is a fundamental fold feature that could be a valuable addition to the arsenal of protein modeling and analysis tools.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Computation of the protein core atomic interaction network (PCAIN) from the 2-D protein contact map (PCM).
The PCM accounts for all atomic interactions in the 3-D protein structure while the PCAIN involves atomic interactions between just the conserved, solvent inaccessible residues in the ‘core’ of protein domains.
Figure 2
Figure 2. Snapshots from the PCAIN database used for mining fold-distinguishing signatures.
The solvent inaccessible core of domains (shaded brown) from all 1018 naturally occurring folds were identified and used to compute the PCAINs (as described in the methods section) as part of the PCAIN database. Shown herein are representative domains and PCAINs (with yellow arrow between) from the following fold families–(A.) Orthogonal α-bundle (DNA helicase RuvA subunit); (B.) Up-down α-bundle (coiled-coil); (C.) α-horseshoe (leucine-rich repeat variant); (D.) α-solenoid (peridinin-chlorophyll protein); (E.) αα-barrell (glycosyltransferase); (F.) αβ-roll (HIV reverse transcriptase); (G.) αβ-complex (cytochrome); (H.) αβ-box (proliferating cell nuclear antigen); (I.) β-ribbon (seminal fluid protein PDC-109); (J.) β-sandwich (neurophysin); (K.) β-barrel (thrombin); (L.) β-propeller (pseudo β-propeller); (M.) β-clam (outer membrane lipoprotein receptor); (N.) β-trefoil (acidic fibroblast growth factor). Fold-distinguishing PCAIN patterns observed herein motivated systemic computation of intra-fold and inter-fold correlations on a family-by-family basis, as shown in supplementary figure S5. Fold-conserved interactions are evolutionary markers and are demarcated (red stars) on the corresponding sample set of the protein family alignments in supplementary figure S3.
Figure 3
Figure 3. Contrasting the fold specificity of protein contact maps (PCMs) and protein core atomic interaction networks (PCAINs).
Averaged intra-family (diagonal) and inter-family (non-diagonal) correlation coefficients of (A.) PCMs and (B.) PCAINs were computed at 5 angstroms threshold distance ρ and normalized solvent accessibility/atom of ω = 10 on a family-by-family basis for several prominent folds of the protein universe. The complete 1018 folds by 1018 folds correlations of PCMs and PCAINs for the entire fold universe is shown in supplementary figure S5. From these figures it is clear that PCAIN is highly fold-specific but PCM shows no discernible fold specificity.
Figure 4
Figure 4. Applications of PCAIN as a divergence-independent metric for protein classification, anchored sequence alignment, and structure prediction.
(A.) PCAINs were computed on a general screen of unselected protein domain sequences that were not part of the database and used to accurately classify these sequences as shown, confirming the fold-specific nature of PCAINs. PCMs of these domains are seen to be ineffective as classifiers in the general sequence space. (B.) PCAIN is seen to be an effective classifier regardless of the sequence identity of the target domain towards members of its native fold and is observed to be effective even in the twilight (<30% PSI) and midnight (<10% PSI) zones. On the other hand, the PCM is observed to be highly dependent on this sequence identity and provides for some moderate classification accuracy only in the high sequence identity range. (C.) The distribution of RMSD between PCAIN-based predicted structures and the reference crystal structures for target sequences with mean RMSD of 1.61A highlights the structure prediction efficacy of the proposed method. (D.) Pie chart of RMSD distribution for test sequences in the twilight and midnight zones is shown, indicating mean RMSD of 1.69A.
Figure 5
Figure 5. PCAIN as a function of threshold interaction distance (ρ) and conserved solvent accessibility (ω) parameters.
(A.) Variation of PCAIN potency (difference between averaged intra-fold and inter-fold PCAIN correlations) with threshold interaction distance ρ and conserved solvent accessibility ω. (B.) At fixed ρ = 4.25 angstroms, the variation of PCAIN potency with ω. (C.) At fixed ω = 25, the variation of PCAIN potency with ρ. (D.) Implementation of adaptive tuning of ρ and ω parameters for maximizing SNR.
Figure 6
Figure 6. Application of the PCAIN methodology to analyze potential structure-function relationships of the novel E3 ligase (NEL) domain from the YopM effector protein of the plague-causative bacterium Yersinia Pestis.
(A.) The YopM NEL domain structure was modeled using the PCAIN methodology and the putative ubiquitin ligase catalytic site was characterized, based on the recent experimental characterizations of Salmonella and Shigella NEL domains –. The likely hydrogen bonds that stabilize the active site (black lines) and the key α-helices (H4, H7, and H9) are indicated. (B.) Vacuum electrostatics of the molecular surfaces from superposed NEL domains of YopM, SlrP, SspH2, and ipaH were generated (see Methods ) with negative, positive, and neutral patches colored red, blue, and white respectively. The finger-like extension (pink line), globular domain (orange arc), and active site location (black arrow) are indicated. (C.) The solvent-unexposed residues that constitute the PCAIN of the modeled YopM NEL domain structure (gray) are shown as sticks (brown). The molecular surface of the YopM NEL domain is also shown alongside to highlight that the residues constituting the PCAIN (brown) are only very minimally solvent exposed. (D.) This is a pictorial depiction of YopM in the intracellular context and the key structural implications for its modulation of human adaptive and innate immune signaling. Specifically, YopM is known to interact with protein kinase C-like 2 (PRK2) and ribosomal S6 protein kinase 1 (RSK1) resulting in increased activity and mobility of these kinases, in addition to potentiating natural killer (NK) cell depletion by suppressing expression of Interleukin-15 (IL-15) . YopM has also been shown to specifically interact with α1-antitrypsin (AAT) without affecting its anti-protease activity, due to which the biological significance of this interaction remains unknown. Also indicated by the question mark (?) symbols are hitherto unknown interactions for YopM, extrapolated based on the functions of the related proteins. Specifically highlighted in this regard are the degradation of human leukocyte antigen-DR (HLA-DR) and thioredoxin (TRX) that may cause suppression of adaptive immune response via moderation of antigen presentation and modulation of innate immune signaling via the MAPK cascade, respectively. It remains to be seen what precise intracellular molecules are targeted by YopM NEL for proteolytic degradation, considering the autoregulated ubiquitin ligase activity suggested by our PCAIN-based model and analysis.

Similar articles

Cited by

References

    1. Choi I-G, Kim S-H. Evolution of protein structural classes and protein sequence families. Proceedings of the National Academy of Sciences of the United States of America. 2006;103(38):14056–14061. - PMC - PubMed
    1. Friedberg I, Margalit H. Persistently Conserved Positions in Structurally-Similar, Sequence Dissimilar Proteins: Roles in Preserving Protein Fold and Function. Protein Science. 2002;11(2):350–360. - PMC - PubMed
    1. Rumbley J, Hoang L, Mayne L, Walter Englander S. An amino acid code for protein folding. Proceedings of the National Academy of Sciences of the United States of America. 2001;98(1):105–112. - PMC - PubMed
    1. Bloom JD, Drummond DA, Arnold FH, Wilke CO. Structural Determinants of the Rate of Protein Evolution in Yeast. Molecular Biology and Evolution. 2006;23(9):1751–1761. - PubMed
    1. Pratt LR, Chandler D. Theory of the hydrophobic effect. J Chem Phys. 1977;67:3683–3704.

Publication types