Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 May 31:8:79-94.
doi: 10.2142/biophysics.8.79. eCollection 2012.

GIRAF: a method for fast search and flexible alignment of ligand binding interfaces in proteins at atomic resolution

Affiliations

GIRAF: a method for fast search and flexible alignment of ligand binding interfaces in proteins at atomic resolution

Akira R Kinjo et al. Biophysics (Nagoya-shi). .

Abstract

Comparison and classification of protein structures are fundamental means to understand protein functions. Due to the computational difficulty and the ever-increasing amount of structural data, however, it is in general not feasible to perform exhaustive all-against-all structure comparisons necessary for comprehensive classifications. To efficiently handle such situations, we have previously proposed a method, now called GIRAF. We herein describe further improvements in the GIRAF protein structure search and alignment method. The GIRAF method achieves extremely efficient search of similar structures of ligand binding sites of proteins by exploiting database indexing of structural features of local coordinate frames. In addition, it produces refined atom-wise alignments by iterative applications of the Hungarian method to the bipartite graph defined for a pair of superimposed structures. By combining the refined alignments based on different local coordinate frames, it is made possible to align structures involving domain movements. We provide detailed accounts for the database design, the search and alignment algorithms as well as some benchmark results.

Keywords: Hungarian algorithm; protein structure comparison; protein-ligand interaction; relational database.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Relational tables in the GIRAF database. A simplified view of the GIRAF database schema. Each rectangle represents a relational table, possibly connected to another table via foreign key reference (edges labeled with “references”). The table structure of the GIparam and Qrefaco tables are copied from that of the Refaco table. The Qrefaco table is created temporarily for each query. The diagram was created using Cytoscape.
Figure 2
Figure 2
Affine frame. (A) For a given amino acid residue, the directions of x, y and z axes are determined by the backbone atoms N, Cα, and C′. (B) The origin of the affine (local coordinate) frame is set to the center of mass of side-chain atoms (Cα for glycines).
Figure 3
Figure 3
Iterative refinement of alignment.
Figure 4
Figure 4
Execution time. (A) Correlation between the number of GI hits and the total execution time (in seconds). (B) Execution times of geometric indexing search (GI time) and of iterative refinement (IR time) in seconds. Colors are magenta for small molecule, green for protei and blue for nucleic acid interfaces.
Figure 5
Figure 5
Iterative refinement. (A) GIRAF scores with 0 [Score(0)] or 5 [Score(5)] iterations for alignment refinement are compared with the difference from GIRAF scores with 5 [Score(5)–Score(0)] or 20 [Score(20)–Score(5)] iterations, respectively. (B) Similar to (A), the number of aligned atom pairs (Nali) are compared. Colors are magenta for the difference between 5 and 0 iterations, green for that between 20 and 5 iterations.
Figure 6
Figure 6
Flexible alignment. (A) Change in the number of aligned atom pairs between rigid and flexible alignments. (B) Change in RMSD (Å) between rigid and flexible alignments.
Figure 7
Figure 7
An example of flexible alignment for small molecule interfaces. The UBP302 binding site of GluR5 from Norway rat (colored orange, magenta, cyan; PDB 2F3552) is superimposed to the glutamate binding site of GluR6 from the same species (colored grey and CPK; PDB 2XXR51). (A) Superposition of folds based on the alignments of ligand binding sites. The colors correspond to those in B–D. (B) An optimal rigid alignment where mainly the atoms around the backbone carboxyl group of glutamate and one of two carboxyl group of UBP302 are aligned. 37 atom pairs are aligned with RMSD of 0.57 Å. (C) A suboptimal rigid alignment where mainly the atoms around the side-chain carboxyl group of glutamate and the other carboxyl group of UBP302 are aligned. 31 atom pairs are aligned with RMSD of 1.05 Å. (D) The resulting flexible alignment which integrates those in (B) and (C). Corresponding atom pairs are connected with a line. In total, 58 atom pairs are aligned with RMSD of 2.43 Å. Note the two structures (orange and CPK) cannot be closely superimposed based on this alignment.
Figure 8
Figure 8
An example of flexible alignment for protein interfaces. Two immunoglobulin light chains (cartoon representation) are superimposed (PDB 2FL5 colored orange, magenta, cyan; 3L7F colored grey) where their “ligands” are immunoglobulin heavy chains (backbone representation). (A) Superposition of folds based on the alignments of protein binding sites. The colors correspond to those in B–D. (B) An optimal rigid alignment where mainly the interface atoms in N-terminal variable domains are aligned. 89 atom pairs are aligned with RMSD of 0.76 Å. (C) A suboptimal rigid alignment where mainly the interface atoms in the C-terminal constant domains are aligned. 75 atom pairs are aligned with RMSD of 0.91 Å. (D) The resulting flexible alignment which integrates those in (B) and (C). Corresponding atom pairs are connected with a line. In total, 164 atom pairs are aligned with RMSD of 6.8 Å.
Figure 9
Figure 9
Results of whole subunit queries. (A) Total execution time (seconds) vs. number of query atoms. The execution time includes pre-processing of the query in addition to GI search and IR procedure. (B) Total execution time vs. number of hits (matching templates) with GIRAF score of at least 40. (C) Histogram of GIRAF score where the scores are dissected into bins of width 5. Colors of the bars are magenta for small molecule, green for protein and blue for nucleic acid interfaces that were found by GIRAF searches.
Figure 10
Figure 10
Mean (magenta crosses) and maximum (green circles) GIRAF score (A), GI score (B), or number of aligned atom pairs (C) of matching templates (binding sites in the GIRAF database). The number of matching queries indicates the number of queries to which a given template was matched. (D) Distribution of GIRAF score against the number of aligned atom pairs (Nali). The line in cyan indicates the threshold.
Figure 11
Figure 11
Pseudo SQL codes. (A) A pseudo SQL code defining the Refaco table (c.f. Fig. 1). (B) The SQL query for geometric indexing search. The tables containing structurally featured affine frames and discretized atomic coordinates of templates and the query are joined (line 2). The identifiers of the interfaces (t.if_id) and affine frames (t.rs_id), interface type (t.type) and the affine frames (t.frame) of templates and the identifier of the matching affine frames of the query (q.rs_id) are returned (line 1) if the structural features of the templates are sufficiently similar to those of the query (lines 3–5) and the number of overlapping atom pairs (lines 6–9) is greater than a threshold (line 10).

References

    1. Taylor WR, Orengo CA. Protein structure alignment. J Mol Biol. 1989;208:1–22. - PubMed
    1. Mitchell EM, Artymiuk PJ, Rice DW, Willett P. Use of techniques derived from graph theory to compare secondary structure motifs in proteins. J Mol Biol. 1990;212:151–166. - PubMed
    1. Nussinov R, Wolfson HJ. Efficient detection of three-dimensional structural motifs in biological macromolecules by computer vision techniques. Proc Natl Acad Sci USA. 1991;88:10495–10499. - PMC - PubMed
    1. Alexandrov NN, Takahashi K, Go N. Common spatial arrangements of backbone fragments in homologous and non-homologous proteins. J Mol Biol. 1992;225:5–9. - PubMed
    1. Holm L, Sander C. Protein structure comparison by alignment of distance matrices. J Mol Biol. 1993;233:123–138. - PubMed