Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012;1(2):76-85.
Epub 2012 Jun 16.

Enhanced fold recognition using efficient short fragment clustering

Affiliations

Enhanced fold recognition using efficient short fragment clustering

Evgeny Krissinel. J Mol Biochem. 2012.

Abstract

The main structure aligner in the CCP4 Software Suite, SSM (Secondary Structure Matching) has a limited applicability on the intermediate stages of the structure solution process, when the secondary structure cannot be reliably computed due to structural incompleteness or a fragmented mainchain. In this study, we describe a new algorithm for the alignment and comparison of protein structures in CCP4, which was designed to overcome SSM's limitations but retain its quality and speed. The new algorithm, named GESAMT (General Efficient Structural Alignment of Macromolecular Targets), employs the old idea of deriving the global structure similarity from a promising set of locally similar short fragments, but uses a few technical solutions that make it considerably faster. A comparative sensitivity and selectivity analysis revealed an unexpected significant improvement in the fold recognition properties of the new algorithm, which also makes it useful for applications in the structural bioinformatics domain. The new tool is included in the CCP4 Software Suite starting from version 6.3.

PubMed Disclaimer

Conflict of interest statement

The author declares no conficts of interest.

Figures

Figure 1
Figure 1. Schematic of structure alignment process in GESAMT.
The left part of Figure 1 represents the fragment similarity matrix for the given chains A and B. Every short section in the matrix represents an SFS. SFSs with similar transformation matrices are collected into clusters, which after further refinement are brought to the common superposition matrix T0.
Figure 2
Figure 2. Discrimination properties of selected structure alignment algorithms.
FATCAT-Rigid, FATCAT-Flexible (Ye & Godzik 2003), SSM (Krissinel & Henrick 2004) and GESAMT (present study) are compared. N(S) gives the probability of getting a score higher than S for dissimilar structures, P(S) corresponds to the probability of getting a score lower than S for similar structures. For FATCAT, S corresponds to the “raw” scores, contained in the FATCAT benchmark set. For SSM and GESAMT, the Q-score was used. Different-color curves correspond to similarity detected at various levels of SCOP hierarchy, as indicated in the figure.
Figure 3
Figure 3. Coverage vs. Error plots (Brenner et al. 1998) for selected structure alignment algorithms.
FATCAT-Rigid (green lines), FATCAT-Flexible (magenta lines), SSM (blue lines) and GESAMT in Normal (red lines) and High (black lines) mode are compared. The optimal discrimination scores from Table 2 were used as similarity thresholds. Different plots correspond to similarity detection at various levels of SCOP hierarchy, as indicated in the figure.
Figure 4
Figure 4. Comparison of Q-scores produced by SSM and GESAMT in Normal (left panel) and High (right panel) mode.
Each dot represents a pair of alignments, produced by SSM and GESAMT for the same protein pair from the benchmark set used.
Figure 5
Figure 5. Comparison of computation times used by SSM and GESAMT in Normal (left panel) and High (right panel) modes for individual alignments.
Each dot represents a pair of alignments, produced by SSM and GESAMT for the same protein pair from the benchmark set used. Total computation time used by SSM is 4,746 secs, GESAMT in Normal mode: 1,107 secs, GESAMT in High mode: 12,167 secs.
Figure 6
Figure 6. Correlation between computation time and product of chain lengths for SSM (red dots) and GESAMT (black dots).
Each dot represents a pair of alignments, produced by SSM and GESAMT for the same protein pair from the benchmark set used.

Similar articles

  • CCP4 Cloud for structure determination and project management in macromolecular crystallography.
    Krissinel E, Lebedev AA, Uski V, Ballard CB, Keegan RM, Kovalevskiy O, Nicholls RA, Pannu NS, Skubák P, Berrisford J, Fando M, Lohkamp B, Wojdyr M, Simpkin AJ, Thomas JMH, Oliver C, Vonrhein C, Chojnowski G, Basle A, Purkiss A, Isupov MN, McNicholas S, Lowe E, Triviño J, Cowtan K, Agirre J, Rigden DJ, Uson I, Lamzin V, Tews I, Bricogne G, Leslie AGW, Brown DG. Krissinel E, et al. Acta Crystallogr D Struct Biol. 2022 Sep 1;78(Pt 9):1079-1089. doi: 10.1107/S2059798322007987. Epub 2022 Aug 30. Acta Crystallogr D Struct Biol. 2022. PMID: 36048148 Free PMC article.
  • The CCP4 suite: integrative software for macromolecular crystallography.
    Agirre J, Atanasova M, Bagdonas H, Ballard CB, Baslé A, Beilsten-Edmands J, Borges RJ, Brown DG, Burgos-Mármol JJ, Berrisford JM, Bond PS, Caballero I, Catapano L, Chojnowski G, Cook AG, Cowtan KD, Croll TI, Debreczeni JÉ, Devenish NE, Dodson EJ, Drevon TR, Emsley P, Evans G, Evans PR, Fando M, Foadi J, Fuentes-Montero L, Garman EF, Gerstel M, Gildea RJ, Hatti K, Hekkelman ML, Heuser P, Hoh SW, Hough MA, Jenkins HT, Jiménez E, Joosten RP, Keegan RM, Keep N, Krissinel EB, Kolenko P, Kovalevskiy O, Lamzin VS, Lawson DM, Lebedev AA, Leslie AGW, Lohkamp B, Long F, Malý M, McCoy AJ, McNicholas SJ, Medina A, Millán C, Murray JW, Murshudov GN, Nicholls RA, Noble MEM, Oeffner R, Pannu NS, Parkhurst JM, Pearce N, Pereira J, Perrakis A, Powell HR, Read RJ, Rigden DJ, Rochira W, Sammito M, Sánchez Rodríguez F, Sheldrick GM, Shelley KL, Simkovic F, Simpkin AJ, Skubak P, Sobolev E, Steiner RA, Stevenson K, Tews I, Thomas JMH, Thorn A, Valls JT, Uski V, Usón I, Vagin A, Velankar S, Vollmar M, Walden H, Waterman D, Wilson KS, Winn MD, Winter G, Wojdyr M, Yamashita K. Agirre J, et al. Acta Crystallogr D Struct Biol. 2023 Jun 1;79(Pt 6):449-461. doi: 10.1107/S2059798323003595. Epub 2023 May 30. Acta Crystallogr D Struct Biol. 2023. PMID: 37259835 Free PMC article.
  • UQlust: combining profile hashing with linear-time ranking for efficient clustering and analysis of big macromolecular data.
    Adamczak R, Meller J. Adamczak R, et al. BMC Bioinformatics. 2016 Dec 28;17(1):546. doi: 10.1186/s12859-016-1381-2. BMC Bioinformatics. 2016. PMID: 28031034 Free PMC article.
  • Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.
    Crider K, Williams J, Qi YP, Gutman J, Yeung L, Mai C, Finkelstain J, Mehta S, Pons-Duran C, Menéndez C, Moraleda C, Rogers L, Daniels K, Green P. Crider K, et al. Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217. Cochrane Database Syst Rev. 2022. PMID: 36321557 Free PMC article.
  • Ongoing developments in CCP4 for high-throughput structure determination.
    Winn MD, Ashton AW, Briggs PJ, Ballard CC, Patel P. Winn MD, et al. Acta Crystallogr D Biol Crystallogr. 2002 Nov;58(Pt 11):1929-36. doi: 10.1107/s0907444902016116. Epub 2002 Oct 21. Acta Crystallogr D Biol Crystallogr. 2002. PMID: 12393924 Review.

Cited by

References

    1. Brenner SE, Chothia C, Hubbard TJP. Assessing sequence comparison methods with reliable structurally-identified distant evolutionary relationships. Proc Natl Acad Sci. 1998;95:6073–6078. - PMC - PubMed
    1. Diamond R. On the multiple simultaneous superposition of molecular structures by rigid body transformations. Protein Sci. 1992;1:1279–1287. - PMC - PubMed
    1. Friedberg I, Harder T, Kolodny R, Sitbon E, Li Z, Godzik A. Using an alignment of fragmented strings for comparing protein structures. Bioinformatics. 2007;23:219–224. - PubMed
    1. Gerstein M, Levitt M. Using iterative dynamic programming to obtain accurate pairwise and multiple alignments of protein structures. Proceedings of the Fourth International Conference on Intelligent Systems in Molecular Biology; Menlo Park, CA: AAAI Press; 1996. pp. 59–67. - PubMed
    1. Guerra C, Istrail S. Mathematical methods for protein structure analysis and design: Advanced Lectures. Berlin: Springer Verlag; 2000.

LinkOut - more resources