Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2008 May 31:8:28.
doi: 10.1186/1472-6807-8-28.

CUSP: an algorithm to distinguish structurally conserved and unconserved regions in protein domain alignments and its application in the study of large length variations

Affiliations
Comparative Study

CUSP: an algorithm to distinguish structurally conserved and unconserved regions in protein domain alignments and its application in the study of large length variations

Sankaran Sandhya et al. BMC Struct Biol. .

Abstract

Background: Distantly related proteins adopt and retain similar structural scaffolds despite length variations that could be as much as two-fold in some protein superfamilies. In this paper, we describe an analysis of indel regions that accommodate length variations amongst related proteins. We have developed an algorithm CUSP, to examine multi-membered PASS2 superfamily alignments to identify indel regions in an automated manner. Further, we have used the method to characterize the length, structural type and biochemical features of indels in related protein domains.

Results: CUSP, examines protein domain structural alignments to distinguish regions of conserved structure common to related proteins from structurally unconserved regions that vary in length and type of structure. On a non-redundant dataset of 353 domain superfamily alignments from PASS2, we find that 'length- deviant' protein superfamilies show > 30% length variation from their average domain length. 60% of additional lengths that occur in indels are short-length structures (< 5 residues) while 6% of indels are > 15 residues in length. Structural types in indels also show class-specific trends.

Conclusion: The extent of length variation varies across different superfamilies and indels show class-specific trends for preferred lengths and structural types. Such indels of different lengths even within a single protein domain superfamily could have structural and functional consequences that drive their selection, underlying their importance in similarity detection and computational modelling. The availability of systematic algorithms, like CUSP, should enable decision making in a domain superfamily-specific manner.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schema of CUSP algorithm and the scoring scheme employed for identifying structurally conserved and unconserved blocks [SSB and USB]. Steps 1–4 illustrate the steps involved in processing structure-based alignments of an example domain superfamily. Scoring schemes that capture structural type exchanges at each position in the alignment (represented as X1 and X2 exchanges for comparisons of each pair) are first applied to each position. Consecutive positions with high scores are merged to identify structurally conserved blocks and distinguish them from indels. An average score is associated with each such block and used to annotate the alignment to distinguish indel regions (USB) from 'core' regions (SSB). In the example, highly conserved structural blocks (H, E and C) identified by high block scores (> 4.5), are indicated in maroon, dark blue and dark green respectively. Conserved blocks that show 'medium' conservation are also indicated (red (helix), cyan (strand) and light green (coil)). The remaining regions are treated as USB.
Figure 2
Figure 2
a) Distribution of length variation (described by mean standard deviation) in 353 domain superfamily members of Alpha, Beta, Alpha/Beta (AorB) and Alpha +Beta (AplusB) classes. b) Class specific distribution of the extent of length variation (expressed as a ratio of mean domain size) of all superfamily members.
Figure 3
Figure 3
a) Class specific distribution of the type of structure observed in indel regions. b) Class specific distribution of indel lengths. c) Distribution of indel lengths of various structural types [α-helix, β-strand, coils] in indel regions.
Figure 4
Figure 4
Length adjustments in length-deviant superfamilies from the four major classes. Panels' I-IV depict 'dwarf and giant' representative members (left and right respectively) of a deviant superfamily from alpha, beta, alpha/beta and alpha +beta class. Representative members are indicated with PDB id and domain length. CUSP reported structurally conserved regions (SSB), whose lengths and structural type are retained across all domain superfamily members (in brown), are distinguished from unconserved regions/indels (USB, in green). (a) Cytochrome C superfamily 'giant' members are 56% more likely to adjust extra length as coils and short length helices.(b) Viral proteins from β-class have acquired additional strands and coils in indel regions. Up to two-fold length variations are seen as additional coils and helices in (c) Actin-like ATPase and (d) Lysozyme-like domain superfamilies.
Figure 5
Figure 5
a Structurally conserved regions identified in the globin fold (in pink) by CUSP on independently derived alignments from PASS2 and CE (left and right respectively).b Dwarf and giant domains in the Ferritin superfamily ([1dvba1 (1–147)] and [1mtyd- (15–526)], left and right respectively) show a common conserved core of 4 helices (in brown) surrounding a central Fe atom. Additional lengths in methane monoxygenase hydroxylase, the giant domain, (in green) participate in domain interactions.
Figure 6
Figure 6
Giant and dwarf domains of the SH3 domain like superfamily (a) [1i1ja(1–106)] and (b) [1gcqa-(158–213)]) show additional structures near the ligand-binding site. Structural superposition of the domain superfamily members (c) shows an appreciable conservation of the core structures (in yellow). (d) Structview representation of the alignment of different domain members of the protein superfamily shows a well conserved core involving β-strands and indels acquiring secondary structure in the giant domain.

References

    1. Berman HM, Bhat TN, Bourne PE, Feng Z, Gilliland G, Weissig H, Westbrook J. The Protein Data Bank and the challenge of structural genomics. Nat Struct Biol. 2000;7:957–959. doi: 10.1038/80734. - DOI - PubMed
    1. Wolf Y, Madej T, Babenko V, Shoemaker B, Panchenko AR. Long-term trends in evolution of indels in protein sequences. BMC Evol Biol. 2007;7:19. doi: 10.1186/1471-2148-7-19. - DOI - PMC - PubMed
    1. Zhang J. Protein-length distributions for the three domains of life. Trends Genet. 2000;16:107–109. doi: 10.1016/S0168-9525(99)01922-8. - DOI - PubMed
    1. Bhaduri A, Pugalenthi G, Sowdhamini R. PASS2: an automated database of protein alignments organised as structural superfamilies. BMC Bioinformatics. 2004;5:35. doi: 10.1186/1471-2105-5-35. - DOI - PMC - PubMed
    1. Pascarella S, Argos P. Analysis of insertions/deletions in protein structures. J Mol Biol. 1992;224:461–471. doi: 10.1016/0022-2836(92)91008-D. - DOI - PubMed

Publication types

LinkOut - more resources