Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Apr 5:2:836526.
doi: 10.3389/fbinf.2022.836526. eCollection 2022.

The Importance of Weakly Co-Evolving Residue Networks in Proteins is Revealed by Visual Analytics

Affiliations

The Importance of Weakly Co-Evolving Residue Networks in Proteins is Revealed by Visual Analytics

Sidharth Mohan et al. Front Bioinform. .

Abstract

Small changes in a protein's core packing produce changes in function, and even small changes in function bias species fitness and survival. Therefore individually deleterious mutations should be evolutionarily coupled with compensating mutations that recover fitness. Co-evolving pairs of mutations should be littered across evolutionary history. Despite longstanding intuition, the results of co-evolution analyses have largely disappointed expectations. Regardless of the statistics applied, only a small majority of the most strongly co-evolving residues are typically found to be in contact, and much of the "meaning" of observed co-evolution has been opaque. In a medium-sized protein of 300 amino acids, there are almost 20 million potentially-important interdependencies. It is impossible to understand this data in textual format without extreme summarization or truncation. And, due to summarization and truncation, it is impossible to identify most patterns in the data. We developed a visualization approach that eschews the common "look at a long list of statistics" approach and instead enables the user to literally look at all of the co-evolution statistics simultaneously. Users of our tool reported visually obvious "clouds" of co-evolution statistics forming distinct patterns in the data, and analysis demonstrated that these clouds had structural relevance. To determine whether this phenomenon generalized, we repeated this experiment in three proteins we had not previously studied. The results provide evidence about how structural constrains have impacted co-evolution, why previous "examine the most frequently co-evolving residues" approaches have had limited success, and additionally shed light on the biophysical importance of different types of co-evolution.

Keywords: analytics; contact; correlations; evolution; proteins; structure; visualization.

PubMed Disclaimer

Conflict of interest statement

HO was employed by the company Lilly Research Laboratories, Eli Lilly and Company. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
StickWRLD presents the user with an interactive interface to a (pseudo) radial-layout node-link diagram of residue co-evolution statistics. The family (residue identity) position-specific scoring matrix (PSSM) is arranged sequentially around the periphery of a cylinder and edges connecting co-evolving residues are drawn between their corresponding PSSM positions. These diagrams can contain as much or as little of the complete set of co-evolution statistics as the user desires, and can be rapidly “dithered” around any given set of parameters to see how small changes in parameter choice change the displayed subset of residue-pair statistics. In this figure, diagrams show StickWRLD visualizations for the correlated mutations within adenylate kinase, extracted solely from the Pfam (Punta et al., 2012) sequence alignments. They are arranged in order of decreasing statistical significance. The visually salient “clouds” or clusters of edges visible in (D) are indicative of structural contacts, despite the fact that at p ≤ 0.1 they are well within the noise floor. T r > = 0.15 for all images. The start of the arrow in subfigure (A) indicates the N-terminal of the protein and it points in the direction of increasing sequence coordinates. (A) The most significant correlations, with p ≤ 0.005, have no obvious visual pattern. (B) As the significance is weakened, here p ≤ 0.010, more correlations appear. (C) When correlations with p as poor as 0.050 are shown, distinct patterns begin to appear in the cloud of minimal-significance correlations. (D) Even when the significance is only p ≤ 0.100, the cloud of weak correlations remains visually focused around certain areas in the diagram.
FIGURE 2
FIGURE 2
At similar thresholds of T r and P as seen in ADK, other protein families display similar “clouds” of weakly co-evolving residue pairs, and similar areas of visually-random or node-link-free space. These residues linked by these weak co-evolution statistics are universally closer than the expected distance for residues of similar sequential separation in their protein structures. The start of the arrows in each image indicates the N-terminal of each protein and it points in the direction of increasing sequence coordinates. It should be noted that there are fewer distances plotted in the inter-residue distance plots than in the StickWRLD diagram of co-evolving residues, because the inter-residue distance plot shows distances for only those residue pairs that occur in each specific PDB file, while the StickWRLD diagram shows all co-evolution across each PFam family. (A) Correlated evolution statistics in the gelsolin domain family visualized as node-links in StickWRLD. T r > = 0.12, p < = 0.05. (B) Residue pairs selected by a user as interesting in the StickWRLD diagram, plotted against the inter-residue distance distribution for the gelsolin Pfam family, with distances as found in Chain A of the 1NM1 PDB structure. (C) Correlated evolution statistics in the X8 family visualized as node-links in StickWRLD. T r > = 0.1, p < = 0.001. (D) Residue pairs selected by a user as interesting in the StickWRLD diagram, plotted against the inter-residue distance distribution for the X8 Pfam family, with distances as found in the 2JON PDB structure.
FIGURE 3
FIGURE 3
In P-II, an unmistakable cluster of co-evolving residues appears at similar T r and P thresholds, but the distances between the implicated residues are scattered both above and below the expected distance for residues of similar sequential separation. The start of the arrow in subfigure (A) indicates the N-terminal of the protein and it points in the direction of increasing sequence coordinates. (A) Correlated evolution statistics in the P-II family visualized as node-links in StickWRLD. T r > = 0.11, p < = 0.005. (B) Residue pairs selected by a user as interesting in the StickWRLD diagram, plotted against the inter-residue distance distribution for the P-II Pfam family, with distances as found in Chain B of the 1HWU PDB structure. The unexpectedly long inter-residue distances found for several of the cloud picks, appear to be due to the actual related pairs occurring in neighboring chains of the P-II multimer (Figure 4), rather than entirely within a single monomer subunit.
FIGURE 4
FIGURE 4
The structure of the homotrimeric functional P-II signal transduction protein, one unit of the trimer shown in orange, one in white, and one in blue. Links are shown in one subunit between the residues selected as co-evolving in the Pfam family PF00543 seed alignment. The blue helices at the upper right are the blue subunit’s copy of the orange helices to the trimeric structure’s left. The unexpectedly distant residue pairs seen in Figure 3 may be due to the co-evolution being between the blue subunit and orange subunit, rather than entirely within the orange (or any other single) subunit.

Similar articles

Cited by

References

    1. Altschul S. F., Madden T. L., Schäffer A. A., Zhang J., Zhang Z., Miller W., et al. (1997). Gapped Blast and Psi-Blast: a New Generation of Protein Database Search Programs. Nucleic Acids Res. 25, 3389–3402. 10.1093/nar/25.17.3389 - DOI - PMC - PubMed
    1. Barral P., Batanero E., Palomares O., Quiralte J., Villalba M., Rodríguez R. (2004). A Major Allergen from Pollen Defines a Novel Family of Plant Proteins and Shows Intra- and Interspecies [correction of Interspecie] Cross-Reactivity. J. Immunol. 172, 3644–3651. 10.4049/jimmunol.172.6.3644 - DOI - PubMed
    1. Beadle B. M., Shoichet B. K. (2002). Structural Bases of Stability-Function Tradeoffs in Enzymes. J. Mol. Biol. 321, 285–296. 10.1016/s0022-2836(02)00599-5 - DOI - PubMed
    1. Berman H. M., Westbrook J., Feng Z., Gilliland G., Bhat T. N., Weissig H., et al. (2000). The Protein Data Bank. Nucleic Acids Res. 28, 235–242. 10.1093/nar/28.1.235 - DOI - PMC - PubMed
    1. Berry M. B., Phillips G. N., Jr. (1998). Crystal Structures of Bacillus Stearothermophilus Adenylate Kinase with Bound Ap5A, Mg2+ Ap5A, and Mn2+ Ap5A Reveal an Intermediate Lid Position and Six Coordinate Octahedral Geometry for Bound Mg2+ and Mn2+. Proteins 32, 276–288. 10.1002/(sici)1097-0134(19980815)32:3<276::aid-prot3>3.0.co;2-g - DOI - PubMed

LinkOut - more resources