. 2010 Feb 4:11:75.

doi: 10.1186/1471-2105-11-75.

Mining protein loops using a structural alphabet and statistical exceptionality

Leslie Regad¹, Juliette Martin, Gregory Nuel, Anne-Claude Camproux

Affiliations

PMID: 20132552
PMCID: PMC2833150
DOI: 10.1186/1471-2105-11-75

Mining protein loops using a structural alphabet and statistical exceptionality

Leslie Regad et al. BMC Bioinformatics. 2010.

. 2010 Feb 4:11:75.

doi: 10.1186/1471-2105-11-75.

Authors

Leslie Regad¹, Juliette Martin, Gregory Nuel, Anne-Claude Camproux

Affiliation

¹ MTi, Inserm UMR-S 973, Université Paris Diderot- Paris 7, Paris, F-75205 Cedex 13, France. leslie.regad@univ-paris-diderot.fr

PMID: 20132552
PMCID: PMC2833150
DOI: 10.1186/1471-2105-11-75

Abstract

Background: Protein loops encompass 50% of protein residues in available three-dimensional structures. These regions are often involved in protein functions, e.g. binding site, catalytic pocket... However, the description of protein loops with conventional tools is an uneasy task. Regular secondary structures, helices and strands, have been widely studied whereas loops, because they are highly variable in terms of sequence and structure, are difficult to analyze. Due to data sparsity, long loops have rarely been systematically studied.

Results: We developed a simple and accurate method that allows the description and analysis of the structures of short and long loops using structural motifs without restriction on loop length. This method is based on the structural alphabet HMM-SA. HMM-SA allows the simplification of a three-dimensional protein structure into a one-dimensional string of states, where each state is a four-residue prototype fragment, called structural letter. The difficult task of the structural grouping of huge data sets is thus easily accomplished by handling structural letter strings as in conventional protein sequence analysis. We systematically extracted all seven-residue fragments in a bank of 93000 protein loops and grouped them according to the structural-letter sequence, named structural word. This approach permits a systematic analysis of loops of all sizes since we consider the structural motifs of seven residues rather than complete loops. We focused the analysis on highly recurrent words of loops (observed more than 30 times). Our study reveals that 73% of loop-lengths are covered by only 3310 highly recurrent structural words out of 28274 observed words). These structural words have low structural variability (mean RMSd of 0.85 A). As expected, half of these motifs display a flanking-region preference but interestingly, two thirds are shared by short (less than 12 residues) and long loops. Moreover, half of recurrent motifs exhibit a significant level of amino-acid conservation with at least four significant positions and 87% of long loops contain at least one such word. We complement our analysis with the detection of statistically over-represented patterns of structural letters as in conventional DNA sequence analysis. About 30% (930) of structural words are over-represented, and cover about 40% of loop lengths. Interestingly, these words exhibit lower structural variability and higher sequential specificity, suggesting structural or functional constraints.

Conclusions: We developed a method to systematically decompose and study protein loops using recurrent structural motifs. This method is based on the structural alphabet HMM-SA and not on structural alignment and geometrical parameters. We extracted meaningful structural motifs that are found in both short and long loops. To our knowledge, it is the first time that pattern mining helps to increase the signal-to-noise ratio in protein loops. This finding helps to better describe protein loops and might permit to decrease the complexity of long-loop analysis. Detailed results are available at http://www.mti.univ-paris-diderot.fr/publication/supplementary/2009/ACCLoop/.

PubMed Disclaimer

Figures

**Figure 1**
**Loop-word extraction from chain B of protein 1GPW**. a) 3D structure of the protein, b) the 27 structural letters of HMM-SA, c) structure simplification as a succession of structural letters, d) extraction of simplified loops, e) extraction of overlapping words of four structural-letters with structural words FFFF and GDZI illustrated.

**Figure 2**
**Correspondence analysis between the eight loop-types (defined according the length and the flanking regions of the loops) and the structural words in Wset_≥30**. αα_s, αβ_s, βα_s, ββ_scorrespond to the four different short-loop-types according to the flanking regions, and αα_l, αβ_l, βα_l, ββ_lcorrespond to the four different long-loop-types according to the flanking regions. αα: loops linking two α-helices, αβ: loops linking an α-helix and a β-strand, βα: loops linking a β-strand and an α-helix, ββ: loops linking two β-strands. The two first axes account for 36% + 26% = 62% of the variance. a) Plot of the eight loop types, b) Plot of Wset_≥30words colored according to their statistical exceptionality: red = OR_w, gray = NS_w, blue = UR_w.

**Figure 3**
**Structural variability of the three statistical word types of Wset_≥30**. a) Intra-word structural variability: distribution of the RMSd_w. The vertical line corresponds to a threshold of 0.6 Å b) Inter-word structural variability: Sammon's map computed from the RMSd_devfor a sample of 890 words of Wset_≥30. All the points are subjected to the same projection and plotted on distinct plots.

**Figure 4**
**Sequential specificity of the three statistical word types of Wset_≥30**. a) Intra-word analysis: distribution of the Z_max. The vertical line corresponds to a threshold of 10. b) Inter-word analysis: Sammon's map computed from the Euclidean distance between Z-scores. All the points are subjected to the same projection and plotted on distinct plots.

**Figure 5**
**Recurrent words found both in long and short loops**. A long loop of 18 structural letters (central figure) extracted from protein with pdb code 3SIL contains four words (UOGI, KHBB, IFFR, RPBQ) of Wset_≥30. The protein is colored in gray, and the loop in blue except the four words UOGI in magenta, KHBB in cyan, IFFR in red and RPBQ in green. These four words are also seen in short loops in other structures. For each word, we indicate the structural letter pattern, the loop length within brackets and the pdb code of the protein structures. Structures are displayed with pymol [78].

**Figure 6**
**Functional residues of sialidase** 3SIL. Catalytic and binding residues annotated in Swiss-Prot are highlighted in pink and cyan. The inhibitor (found in structure 1DIL) is highlighted in red. The long loop revealed by the structural word analysis is highlighted in yellow.

See this image and copyright information in PMC

Cited by

Dissecting protein loops with a statistical scalpel suggests a functional implication of some structural motifs.
Regad L, Martin J, Camproux AC. Regad L, et al. BMC Bioinformatics. 2011 Jun 20;12:247. doi: 10.1186/1471-2105-12-247. BMC Bioinformatics. 2011. PMID: 21689388 Free PMC article.
SA-Mot: a web server for the identification of motifs of interest extracted from protein loops.
Regad L, Saladin A, Maupetit J, Geneix C, Camproux AC. Regad L, et al. Nucleic Acids Res. 2011 Jul;39(Web Server issue):W203-9. doi: 10.1093/nar/gkr410. Epub 2011 Jun 10. Nucleic Acids Res. 2011. PMID: 21665924 Free PMC article.
Analysis of the HIV-2 protease's adaptation to various ligands: characterization of backbone asymmetry using a structural alphabet.
Triki D, Cano Contreras ME, Flatters D, Visseaux B, Descamps D, Camproux AC, Regad L. Triki D, et al. Sci Rep. 2018 Jan 15;8(1):710. doi: 10.1038/s41598-017-18941-3. Sci Rep. 2018. PMID: 29335428 Free PMC article.
Accounting for large amplitude protein deformation during in silico macromolecular docking.
Bastard K, Saladin A, Prévost C. Bastard K, et al. Int J Mol Sci. 2011 Feb 22;12(2):1316-33. doi: 10.3390/ijms12021316. Int J Mol Sci. 2011. PMID: 21541061 Free PMC article.
Considerations of Protein Subpockets in Fragment-Based Drug Design.
Bartolowits M, Davisson VJ. Bartolowits M, et al. Chem Biol Drug Des. 2016 Jan;87(1):5-20. doi: 10.1111/cbdd.12631. Epub 2015 Aug 31. Chem Biol Drug Des. 2016. PMID: 26307335 Free PMC article.

See all "Cited by" articles

References

1. Fetrow JS. Omega loops: nonregular secondary structures significant in protein function and stability. FASEB J. 1995;9:708–717. - PubMed
1. Johnson LN, Lowe ED, Noble ME, Owen DJ. The Eleventh Datta Lecture. The structural basis for substrate recognition and control by protein kinases. FEBS Lett. 1998;430:1–11. doi: 10.1016/S0014-5793(98)00606-1. - DOI - PubMed
1. Bernstein LS, Ramineni S, Hague C, Cladman W, Chidiac P, Levey AI, Hepler JR. RGS2 binds directly and selectively to the M1 muscarinic acetylcholine receptor third intracellular loop to modulate Gq/11alpha signaling. J Biol Chem. 2004;279:21248–21256. doi: 10.1074/jbc.M312407200. - DOI - PubMed
1. Kiss C, Fisher H, Pesavento E, Dai M, Valero R, Ovecka M, Nolan R, Phipps ML, Velappan N, Chasteen L, Martinez JS, Waldo GS, Pavlik P, Bradbury AR. Antibody binding loop insertions as diversity elements. Nucl Acids Res. 2006;34:132–146. doi: 10.1093/nar/gkl681. - DOI - PMC - PubMed
1. Saraste M, Sibbald PR, Wittinghofer A. The P-loop: a common motif in ATP- and GTP-binding proteins. Trends Biochem Sci. 1990;15:430–434. doi: 10.1016/0968-0004(90)90281-F. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Mining protein loops using a structural alphabet and statistical exceptionality

Affiliation

Mining protein loops using a structural alphabet and statistical exceptionality

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Research Materials

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Research Materials