Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Oct 18:14:22.
doi: 10.1186/s12900-014-0022-0.

A PDB-wide, evolution-based assessment of protein-protein interfaces

A PDB-wide, evolution-based assessment of protein-protein interfaces

Kumaran Baskaran et al. BMC Struct Biol. .

Abstract

Background: Thanks to the growth in sequence and structure databases, more than 50 million sequences are now available in UniProt and 100,000 structures in the PDB. Rich information about protein-protein interfaces can be obtained by a comprehensive study of protein contacts in the PDB, their sequence conservation and geometric features.

Results: An automated computational pipeline was developed to run our Evolutionary Protein-Protein Interface Classifier (EPPIC) software on the entire PDB and store the results in a relational database, currently containing > 800,000 interfaces. This allows the analysis of interface data on a PDB-wide scale. Two large benchmark datasets of biological interfaces and crystal contacts, each containing about 3000 entries, were automatically generated based on criteria thought to be strong indicators of interface type. The BioMany set of biological interfaces includes NMR dimers solved as crystal structures and interfaces that are preserved across diverse crystal forms, as catalogued by the Protein Common Interface Database (ProtCID) from Xu and Dunbrack. The second dataset, XtalMany, is derived from interfaces that would lead to infinite assemblies and are therefore crystal contacts. BioMany and XtalMany were used to benchmark the EPPIC approach. The performance of EPPIC was also compared to classifications from the Protein Interfaces, Surfaces, and Assemblies (PISA) program on a PDB-wide scale, finding that the two approaches give the same call in about 88% of PDB interfaces. By comparing our safest predictions to the PDB author annotations, we provide a lower-bound estimate of the error rate of biological unit annotations in the PDB. Additionally, we developed a PyMOL plugin for direct download and easy visualization of EPPIC interfaces for any PDB entry. Both the datasets and the PyMOL plugin are available at http://www.eppic-web.org/ewui/\#downloads.

Conclusions: Our computational pipeline allows us to analyze protein-protein contacts and their sequence conservation across the entire PDB. Two new benchmark datasets are provided, which are over an order of magnitude larger than existing manually curated ones. These tools enable the comprehensive study of several aspects of protein-protein contacts in the PDB and represent a basis for future, even larger scale studies of protein-protein interactions.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematic representation of the PDB-wide EPPIC precalculation pipeline. Web servers are denoted by green blocks, local databases and inputs by blue blocks, instances of the EPPIC program by brown blocks.
Figure 2
Figure 2
Example of interface display with the EPPIC Interface Loader plugin for PyMOL. Interface 1 of entry 2trx (E.coli thioredoxin) fetched in PyMOL with the EPPIC Interface Loader plugin and displayed in hybrid mode (surface color-mapped by sequence entropy for one interface partner and cartoon for the other partner).
Figure 3
Figure 3
Interface area distribution of three datasets of interfaces. The interface areas for crystal contacts (red) and biological interfaces (green) are shown for three interface datasets: DCBio/Xtal (left), Bio/XtalMany (center) and Ponstingl (right). The numbers in parentheses refer to the counts of bio and xtal interfaces in each dataset.
Figure 4
Figure 4
EPPIC per-indicator performance against three datasets of interfaces. The ROC curves below show per-indicator EPPIC performance against the same three datasets of interfaces depicted in Figure 3.
Figure 5
Figure 5
The Janin curve (1997) revisited. The Janin curve is plotted against EPPIC calls (based on evolutionary indicators, cyan, and on geometry, green) for all current (May 2014) PDB interfaces larger than 600 Å 2 and against all PDB interfaces conducive to infinite assemblies. The curves are plotted as normalized probability versus interface area.
Figure 6
Figure 6
Interface call comparison between EPPIC and PISA. The histogram represents the fraction of convergent and divergent interface calls by EPPIC and PISA as a function of interface area. The top call in the color legend corresponds to EPPIC and the bottom one to PISA. The overall percentages of each call combination are also given.
Figure 7
Figure 7
Author annotation errors in the PDB. Author annotations are compared to to the EPPIC predictions. The comparison is done on a subset of 10,000 interfaces each from the extrema of the core-surface score distribution. The top call in the color legend corresponds to EPPIC and the bottom one to the author annotation.
Figure 8
Figure 8
PDB-wide distribution of EPPIC monomer versus multimer predictions by experimental technique. PDB entries are considered monomeric (red) if none of their interfaces is classified by EPPIC as bio; otherwise, they are considered multimeric (green).
Figure 9
Figure 9
Interface classification as a function of operator type. The green portions of the bars represent interfaces classified as bio, the red ones interfaces classified as xtal. Operators are denoted as follows, from left to right: 2S, two-fold screw axis; AU, non-crystallographic symmetry; XT, crystal cell translation; 2, two-fold axis; 3S, three-fold screw axis; 4S, four-fold screw axis; 3, three-fold axis; FT, fractional translation; 6S, six-fold screw axis; 4, four-fold axis; 6, six-fold axis; -1, inversion center; -4, four-fold rotoinversion axis; GL, glide plane.

References

    1. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The protein data bank. Nucleic Acids Res. 2000;28:235–242,. doi: 10.1093/nar/28.1.235. [http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=102472&too... - DOI - PMC - PubMed
    1. http://doi.wiley.com/10.1002/prot.22787 Schärer Ma, Grütter MG, Capitani G: CRK: An evolutionary approach for distinguishing biologically relevant interfaces from crystal contacts. Proteins: Struct Funct Bioinformatics2010,[] - DOI - PubMed
    1. Duarte JM, Srebniak A, Capitani G. Protein interface classification by evolutionary analysis. BMC Bioinformatics. 2012;13:334,. doi: 10.1186/1471-2105-13-334. [http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3556496&to... - DOI - PMC - PubMed
    1. Duarte JM, Biyani N, Baskaran K, Capitani G. An analysis of oligomerization interfaces in transmembrane proteins. BMC Struct Biol. 2013;13:21. doi: 10.1186/1472-6807-13-21. [http://www.ncbi.nlm.nih.gov/pubmed/24134166] - DOI - PMC - PubMed
    1. Ivan G, Szabadka Z, Grolmusz V. A hybrid clustering of protein binding sites. Febs J. 2010;277(6):1494–1502. doi: 10.1111/j.1742-4658.2010.07578.x. [http://www.ncbi.nlm.nih.gov/pubmed/20148971] - DOI - PubMed

Publication types

LinkOut - more resources