Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Mar 8:17:119.
doi: 10.1186/s12859-016-0975-z.

CoeViz: a web-based tool for coevolution analysis of protein residues

Affiliations

CoeViz: a web-based tool for coevolution analysis of protein residues

Frazier N Baker et al. BMC Bioinformatics. .

Abstract

Background: Proteins generally perform their function in a folded state. Residues forming an active site, whether it is a catalytic center or interaction interface, are frequently distant in a protein sequence. Hence, traditional sequence-based prediction methods focusing on a single residue (or a short window of residues) at a time may have difficulties in identifying and clustering the residues constituting a functional site, especially when a protein has multiple functions. Evolutionary information encoded in multiple sequence alignments is known to greatly improve sequence-based predictions. Identification of coevolving residues further advances the protein structure and function annotation by revealing cooperative pairs and higher order groupings of residues.

Results: We present a new web-based tool (CoeViz) that provides a versatile analysis and visualization of pairwise coevolution of amino acid residues. The tool computes three covariance metrics: mutual information, chi-square statistic, Pearson correlation, and one conservation metric: joint Shannon entropy. Implemented adjustments of covariance scores include phylogeny correction, corrections for sequence dissimilarity and alignment gaps, and the average product correction. Visualization of residue relationships is enhanced by hierarchical cluster trees, heat maps, circular diagrams, and the residue highlighting in protein sequence and 3D structure. Unlike other existing tools, CoeViz is not limited to analyzing conserved domains or protein families and can process long, unstructured and multi-domain proteins thousands of residues long. Two examples are provided to illustrate the use of the tool for identification of residues (1) involved in enzymatic function, (2) forming short linear functional motifs, and (3) constituting a structural domain.

Conclusions: CoeViz represents a practical resource for a quick sequence-based protein annotation for molecular biologists, e.g., for identifying putative functional clusters of residues and structural domains. CoeViz also can serve computational biologists as a resource of coevolution matrices, e.g., for developing machine learning-based prediction models. The presented tool is integrated in the POLYVIEW-2D server (http://polyview.cchmc.org/) and available from resulting pages of POLYVIEW-2D.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
A flowchart of CoeViz. Protein data are submitted as defined in the POLYVIEW-2D server ([28], http://polyview.cchmc.org/polyview_doc.html), which includes PDB-formatted coordinate files, output from the sequence-based prediction servers, or custom sequence profiles. At the protein visualization page, there is an option provided to request analysis of covariance of amino acids (CoeViz). The user can choose a covariance metric and a database to generate the MSA or provide a file with the constructed MSA. CoeViz computes a requested covariance or conservation metric with all implemented adjustments separately and performs hierarchical clustering. Once calculations are completed, CoeViz provides an interactive web-interface to review covariance data using heatmaps, circular diagrams, and clustering trees. From the circular diagrams, the user has options to map identified correlated amino acids to a protein 3D structure or sequence depending on the input data. All generated results can be exported in text or graphics formats
Fig. 2
Fig. 2
Amino acid coevolution profile reveals residues constituting the active site of the Cys-Gly metallodipeptidase (SwissProt: DUG1_YEAST). a A fragment of the heat map displaying amino acid coevolution computed using χ2 weighted by sequence dissimilarity derived from sequence alignments to the protein sequence defined in PDB ID 4G1P against NR database with 90 % identity reduction. b A fragment of the cluster tree derived from the chi-square data converted to a distance matrix. c The zoomed in cluster of amino acids that contains known Zn binding residues (H102, D137, E172, H450) and a catalytic site (E171). d From the heat map, one can retrieve a circular diagram representing the closest relationships to a given residue; here is to the one of catalytic residues (E171) after applying a ≥0.3 cutoff to χ2-based cumulative probabilities. e From the circular diagram, one can map the clustered residues to the submitted protein 3D structure; here is to DUG1 (PDB:4G1P). Residues highlighted red (H102, D137, E172, D200, H450) are amino acids binding Zn (grey spheres); magenta – catalytic residues (D104, E171); blue is a residue involved in substrate recognition (R348). The substrate (Cys-Gly) is rendered as sticks colored by an atom type
Fig. 3
Fig. 3
Amino acid coevolution profile reveals residues constituting a structural domain and locations of the functional linear motifs in Cdc20 (SwissProt: CDC20_YEAST). a SS prediction by SABLE visualized by POLYVIEW-2D with residues highlighted in functional motifs and a structural domain: red – residues constituting D- and KEN-boxes; green–residues in the bipartite NLS; blue–C-box; residues with bold face are in the WD-repeats domain. Keys for graphical SS elements can be found in the POLYVIEW-2D documentation. b A full heat map displaying amino acid coevolution computed using MI weighted by phylogeny and derived from sequence alignments to the protein sequence defined in UniProt:P26309 against the whole NR database. Boundaries of the WD domain and functional motifs, as defined in UniProt, are highlighted with green lines. c A zoom-in view of the heat map fragment centered on D-box. d A zoom-in view of the heat map fragment centered on C-box. e A zoom-in view of the heat map fragment showing the upper-left corner of the WD-repeats domain

References

    1. Korber BT, Farber RM, Wolpert DH, Lapedes AS. Covariation of mutations in the V3 loop of human immunodeficiency virus type 1 envelope protein: an information theoretic analysis. Proc Natl Acad Sci U S A. 1993;90(15):7176–7180. doi: 10.1073/pnas.90.15.7176. - DOI - PMC - PubMed
    1. Clarke ND. Covariation of residues in the homeodomain sequence family. Protein Sci. 1995;4(11):2269–2278. doi: 10.1002/pro.5560041104. - DOI - PMC - PubMed
    1. Gobel U, Sander C, Schneider R, Valencia A. Correlated mutations and residue contacts in proteins. Proteins. 1994;18(4):309–317. doi: 10.1002/prot.340180402. - DOI - PubMed
    1. Neher E. How frequent are correlated changes in families of protein sequences? Proc Natl Acad Sci U S A. 1994;91(1):98–102. doi: 10.1073/pnas.91.1.98. - DOI - PMC - PubMed
    1. Pazos F, Helmer-Citterich M, Ausiello G, Valencia A. Correlated mutations contain information about protein-protein interaction. J Mol Biol. 1997;271(4):511–523. doi: 10.1006/jmbi.1997.1198. - DOI - PubMed

Publication types