. 2007 Oct 15:8:388.

doi: 10.1186/1471-2105-8-388.

Topology independent protein structural alignment

Joe Dundas¹, T A Binkowski, Bhaskar DasGupta, Jie Liang

Affiliations

PMID: 17937816
PMCID: PMC2096629
DOI: 10.1186/1471-2105-8-388

Topology independent protein structural alignment

Joe Dundas et al. BMC Bioinformatics. 2007.

. 2007 Oct 15:8:388.

doi: 10.1186/1471-2105-8-388.

Authors

Joe Dundas¹, T A Binkowski, Bhaskar DasGupta, Jie Liang

Affiliation

¹ Department of Bioengineering, University of Illinois at Chicago, Chicago, IL 60607-7053, USA. jdunda1@uic.edu

PMID: 17937816
PMCID: PMC2096629
DOI: 10.1186/1471-2105-8-388

Abstract

Background: Identifying structurally similar proteins with different chain topologies can aid studies in homology modeling, protein folding, protein design, and protein evolution. These include circular permuted protein structures, and the more general cases of non-cyclic permutations between similar structures, which are related by non-topological rearrangement beyond circular permutation. We present a method based on an approximation algorithm that finds sequence-order independent structural alignments that are close to optimal. We formulate the structural alignment problem as a special case of the maximum-weight independent set problem, and solve this computationally intensive problem approximately by iteratively solving relaxations of a corresponding integer programming problem. The resulting structural alignment is sequence order independent. Our method is also insensitive to insertions, deletions, and gaps.

Results: Using a novel similarity score and a statistical model for significance p-value, we are able to discover previously unknown circular permuted proteins between nucleoplasmin-core protein and auxin binding protein, between aspartate rasemase and 3-dehydrogenate dehydralase, as well as between migration inhibition factor and arginine repressor which involves an additional strand-swapping. We also report the finding of non-cyclic permuted protein structures existing in nature between AML1/core binding factor and ribofalvin synthase. Our method can be used for large scale alignment of protein structures regardless of the topology.

Conclusion: The approximation algorithm introduced in this work can find good solutions for the problem of protein structure alignment. Furthermore, this algorithm can detect topological differences between two spatially similar protein structures. The alignment between MIF and the arginine repressor demonstrates our algorithm's ability to detect structural similarities even when spatial rearrangement of structural units has occurred. The effectiveness of our method is also demonstrated by the discovery of previously unknown circular permutations. In addition, we report in this study the finding of a naturally occurring non-cyclic permuted protein between AML1/Core Binding Factor chain F and riboflavin synthase chain A.

PubMed Disclaimer

Figures

**Figure 1**
**Circular permutation example**. The cartoon illustration of three protein structures whose domains are similarly arranged in space but appear in different order in primary sequences. The location of domains A, B, C in primary sequences are shown in a layout below each structure. Their orderings are related by circular permutation [2].

**Figure 2**
**Nucleoplasmin-core and auxin binding protein 1**. A new circular permutation discovered between nucleoplasmin-core (1k5j, chain E, top panel), and the fragment of residues 37–127 of auxin binding protein 1 (1lrh, chain A, bottom panel). a) These two proteins superimpose well spatially, with an RMSD value of 1.36Å for an alignment length of 68 residues and a significant p-value of 2.7 × 10^-5after Bonferroni correction. b) These proteins are related by a circular permutation. The short loop connecting strand 4 and strand 5 of nucleoplasmin-core (in rectangle, top) becomes disconnected in auxin binding protein 1. The N- and C- termini of nucleoplasmin-core (in ellipse, top) become connected in auxin binding protein 1 (in ellipse, bottom). For visualization, residues in the N-to-C direction before the cut in the nucleoplasmin-core protein are colored red, and residues after the cut are colored blue. c) The topology diagram of these two proteins. In the original structure of nucleoplasmin-core, the electron density of the loop connecting strand 4 and strand 5 is missing.

**Figure 3**
**Aspartate racemase and type II 3-deydrogenate dehyralase**. A new circular permutation discovered between a) aspartate racemase (1iu9, chain A, top) and type II 3-dehydrogenate dehydralase (1h0r, chain A, bottom) superimpose well spatially with an RMSD of 1.49Å between 59 residues, with a significant p-value of 4.7 × 10^-4. b) These proteins are related by a circular permutation. The loop connecting helix 1 with strand 1 in aspartate racemase (in rectangle, top) becomes disconnected in type II 3-dehydrogenate dehydralase (in rectangle, bottom), but the N- and C- termini of aspartate racemase (in ellipse, top) becomes connected in dehydrogenate dehydralase (in ellipse, bottom) with an insertion (shown in green). For visualization, residues of aspartate racemase in the N-to-C direction before the cut in the dehydrogenate dehydralase are colored red, and residues after the cut are colored blue. c) The topology diagram of these two proteins. Here an ellipse represents a helix and a block arrow represents a strand.

**Figure 4**
**Microphage migration inhibition factor and C-terminal domain of arginine repressor**. A new circular permutation discovered between a) the microphage migration inhibition factor (MIF, PDB ID 1uiz, chain A, top) and the C-terminal domain of arginine repressor (AR, 1xxa, chain C, bottom). a) These two proteins superimpose well spatially, with a RMSD of 1.74Å for an alignment length of 24 residues, and a p-value of 1.3 × 10^-2. b.) These proteins are related by a circular permutation. The loop connecting helix 1 with strand 2 of MIF (in rectangle, top) becomes disconnected in arginine repressor, the N- and C- termini of MIF (in ellipse, top) becomes connected in arginine repressor (in ellipse, bottom). The disconnection of helix 1 from strand 2 of MIF removes some spatial constraints, allowing strand 1' in AR to swap places with strand 4'. c) The topology diagram of these two proteins. d.) The artificial topology diagram for arginine repressor, where strand 2' and strand 4' are spatially swapped back. The diagram for AR in (c) has the same topology as the diagram in (d).

**Figure 5**
**A non-cyclic permutation**. A novel non-cyclic permutation discovered between AML1/Core Binding Factor (AML1/CBF, PDB ID 1e50, Chain F, top) and riboflavin synthase (PDBID 1pkv, chain A, bottom) a) These two proteins superimpose well spatially, with an RMSD of 1.23 Å and an alignment length of 42 residues, with a significant p-value of 2.8 × 10^-4after Bonferroni correction. Aligned residues are colored blue. b) These proteins are related by multiple permutations. The steps to transform the topology of AML1/CBF (top) to riboflavin (bottom) are as follows: c) Remove the the loops connecting strand 1 to helix 2, strand 4 to strand 5, and strand 5 to helix 6; d) Connect the C-terminal end of strand 4 to the original N-termini; e) Connect the C-terminal end of strand 5 to the N-terminal end of helix 2; f) Connect the original C-termini to the N-terminal end of strand 5. The N-terminal end of strand 6 becomes the new N-termini and the C-terminal end of strand 1 becomes the new C-termini. We now have the topology diagram of riboflavin synthase.

**Figure 6**
**Implementation example with vertex sweep**. An illustration of the first iteration of our algorithmic approaches for *BSSI*_{Λ, σ}: a) The cartoon representation of circularly permuted proteins S_aand S_b; b) The problem represented as a graph where each node χ_i∈ Λ represents an aligned fragment pair and each edge represents two inconsistent pairs; c) An illustration how sweep lines (dashed) can identify inconsistent aligned pairs as required to generate the interval clique inequalities. A rectangle is an ordered fragment pair (e.g., the solid green rectangle is the pair χ₅= (λ5,3a,λ1,3b MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWF7oaBdaqhaaWcbaGaeGynauJaeiilaWIaeG4mamdabaGaemyyaegaaOGaeiilaWIae83UdW2aa0baaSqaaiabigdaXiabcYcaSiabiodaZaqaaiabdkgaIbaaaaa@3982@)).

**Figure 7**
**Secondary Structure cRMSD distributions**. The cRMSD distributions of a) helices of length 4 b) helices of length 5 c) helices of length 6 d) helices of length 7 e) strands of length 4 f) strands of length 5 g) strands of length 6 and h) strands of length 7.

**Figure 8**
**Similarity Score versus length**. a) Linear fit between *raw similarity score σ* (X) (equation 8) as a function of the geometric mean Na⋅Nb MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaGcaaqaaiabd6eaonaaBaaaleaacqWGHbqyaeqaaOGaeyyXICTaemOta40aaSbaaSqaaiabdkgaIbqabaaabeaaaaa@344A@ of the length of the two aligned proteins (N_aand N_bare the number of residues in the two protein structures S_aand S_b). The linear regression line (grey line) has a slope of 0.314. b) Linear fit of the normalized similarity score σ˜ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacuWFdpWCgaacaaaa@2E85@ (X) (equation 9) as a function of the geometric mean of the length of the two aligned proteins. The linear regression line (grey line) has a slope of -0.0004.

**Figure 9**
**Similarity Score Distribution**. The distribution of the normalized similarity scores obtained by aligning 200,000 pairs of proteins randomly selected from PDBSELECT 25% [19]. The distribution can be fit to an Extreme Value Distribution, with parameters α = 14.98 and β = 3.89.

See this image and copyright information in PMC

References

1. Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structure. J Mol Biol. 1995;247:536–540. doi: 10.1006/jmbi.1995.0159. - DOI - PubMed
1. Binkowski TA, DasGupta B, Liang J. Order independent structural alignment of circularly permuted proteins. Conf Proc IEEE Eng Med Biol Soc. 2004;4:2781–2784. - PubMed
1. Lindqvist Y, Schneider G. Circular permutations of natural protein sequences: structural evidence. Curr Opinions Struct Biol. 1997;7:422–427. doi: 10.1016/S0959-440X(97)80061-9. - DOI - PubMed
1. Ponting CP, Russell RB. Swaposins: circular permutations within genes encoding saposin homologues. Trends Biochem Sci. 1995;20:179–180. doi: 10.1016/S0968-0004(00)89003-9. - DOI - PubMed
1. Jeltsch A. Circular permutations in the molecular evolution of DNA methyltransferase. J Mol Evol. 1999;49:161–164. doi: 10.1007/PL00006529. - DOI - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- H1 Connect - Access expert opinions and insights on biomedical research.
- The Lens - Patent Citations Database
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Topology independent protein structural alignment

Affiliation

Topology independent protein structural alignment

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous