Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Jun;37(11):e83.
doi: 10.1093/nar/gkp318. Epub 2009 May 14.

MM-align: a quick algorithm for aligning multiple-chain protein complex structures using iterative dynamic programming

Affiliations

MM-align: a quick algorithm for aligning multiple-chain protein complex structures using iterative dynamic programming

Srayanta Mukherjee et al. Nucleic Acids Res. 2009 Jun.

Abstract

Structural comparison of multiple-chain protein complexes is essential in many studies of protein-protein interactions. We develop a new algorithm, MM-align, for sequence-independent alignment of protein complex structures. The algorithm is built on a heuristic iteration of a modified Needleman-Wunsch dynamic programming (DP) algorithm, with the alignment score specified by the inter-complex residue distances. The multiple chains in each complex are first joined, in every possible order, and then simultaneously aligned with cross-chain alignments prevented. The alignments of interface residues are enhanced by an interface-specific weighting factor. MM-align is tested on a large-scale benchmark set of 205 x 3897 non-homologous multiple-chain complex pairs. Compared with a naïve extension of the monomer alignment program of TM-align, the alignment accuracy of MM-align is significantly higher as judged by the average TM-score of the physically-aligned residues. MM-align is about two times faster than TM-align because of omitting the cross-alignment zone of the DP matrix. It also shows that the enhanced alignment of the interfaces helps in identifying biologically relevant protein complex pairs.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
An illustration of the chain-joining procedure in MM-align. Both chains of the compared dimers are merged into single artificial chains and then aligned with cross-alignments forbidden. The chains corresponding to each other are presented by the same type of lines (thick and thin). Complex 1 is in red and Complex 2 is blue.
Figure 2.
Figure 2.
An illustration of the modified dynamic programming algorithm with cross-chain alignment prevented. The picture on the left panel illustrates the process of filling up the grid, with the cross-alignment zones (empty grids) ignored. The dashed lines represent a pseudo-layer which assumes the value in the last cell of the preceding block. The values of the pseudo-layer (5 and 11 in this example) are used as starting score of the next block corresponding to the next chain of both complexes. The picture on the right panel shows the traceback path (indicated by red arrows).
Figure 3.
Figure 3.
A modified dynamic programming scheme with the alignment of interface residue pairs reinforced. The interface areas are highlighted in color. If the residue pairs are both from an interface (the area in green), the score is increased by a factor w and the gap penalty is increased by a factor x.
Figure 4.
Figure 4.
TM-score histogram of 205 protein complexes and their best-matching structures identified by MM-align in a non-redundant set of 3897 protein complexes.
Figure 5.
Figure 5.
A typical example structures aligned by TM-align, containing cross-chain alignments (left panel), and the same structures aligned without cross-chain alignment by MM-align (right panel). The two complexes are from PDB files 1u20 (thick trace) and 1y7y (thin trace), with the two chains represented in blue and red, respectively.
Figure 6.
Figure 6.
Three examples of protein dimeric complex alignments identified by MM-align, from three different protein classes (alpha-, alpha/beta- and beta-proteins). Thick and thin lines represent the Cα traces of different complexes, and red and green indicate different chains. The grey regions are those with a distance >5 Å in the superposition.
Figure 7.
Figure 7.
The structural alignment of casein kinase (1cki) with its best-matching structures in a non-redundant protein complex library. TM-align picks up human S100P (1j55) with 26 residues aligned across chains (left panel); MM-align picks up the tyrosine kinase domain of fibroblast growth factor (1fgk), without cross-aligned residues.
Figure 8.
Figure 8.
Examples of MM-align on big oligomers. (a) Alignment of the photosynthetic reaction center from Rhodobacter sphaeroides (PDB id: 2jiy, three chains, thick backbone) with that from Rhodopseudomonas viridis (PDB id: 1dxr, 4 chains, thin backbone). Yellow, cyan and yellow are for the first, second and third chains of 2jiy; dark green, magenta, dark green and magents are for the first, second, third and fourth chains of 1dxr. (b) Alignment of cytochrome bc1 complex from chicken (PDB id: 1bcc, 10 chains, thick backbone) with bovine mitochondrial cytochrome bc1 complex (PDB id: 1qcr, 11 chains, thin backbone). The chains are colored red and cyan alternatively for 1bcc and green and magenta for 1qcr. (c) Alignment of phycocyanin from the Gleobacter violaceus (PDB id: 2vml, 12 chains, thick backbone) with phycocyanin from the red algae Gracilaria chilensis (PDB id: 2bv8, 12 chains, thin backbone). The chains are colored in red and cyan alternatively for 2vml and green and magenta for 2bv8. (d) Alignment of bacterial ribosome from E. coli (PDB id: 2qbd, 20 chains, thick backbone) with ribosome of the bacterial species Thermus thermophilus (PDB id: 1fjg, 20 chains, thin backbone). The chains are colored red and yellow alternatively for 2qbd and green and magenta for 1fjg. The grey strands in background are RNA from 2qbd superimposed onto the aligned complexes.

References

    1. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The Protein Data Bank. Nucleic Acids Res. 2000;28:235–242. - PMC - PubMed
    1. Douguet D, Chen HC, Tovchigrechko A, Vakser IA. DOCKGROUND resource for studying protein-protein interfaces. Bioinformatics. 2006;22:2612–2618. - PubMed
    1. Henrick K, Thornton JM. PQS: a protein quaternary structure file server. Trends Biochem. Sci. 1998;23:358–361. - PubMed
    1. Arakaki AK, Zhang Y, Skolnick J. Large-scale assessment of the utility of low-resolution protein structures for biochemical function assignment. Bioinformatics. 2004;20:1087–1096. - PubMed
    1. Graille M, Baltaze JP, Leulliot N, Liger D, Quevillon-Cheruel S, van Tilbeurgh H. Structure-based functional annotation: yeast ymr099c codes for a D-hexose-6-phosphate mutarotase. J. Biol. Chem. 2006;281:30175–30185. - PubMed

Publication types

Substances