Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Aug 4:10:24.
doi: 10.1186/1472-6807-10-24.

Systematic analysis of short internal indels and their impact on protein folding

Affiliations

Systematic analysis of short internal indels and their impact on protein folding

RyangGuk Kim et al. BMC Struct Biol. .

Abstract

Background: Protein sequence insertions/deletions (indels) can be introduced during evolution or through alternative splicing (AS). Alternative splicing is an important biological phenomenon and is considered as the major means of expanding structural and functional diversity in eukaryotes. Knowledge of the structural changes due to indels is critical to our understanding of the evolution of protein structure and function. In addition, it can help us probe the evolution of alternative splicing and the diversity of functional isoforms. However, little is known about the effects of indels, in particular the ones involving core secondary structures, on the folding of protein structures. The long term goal of our study is to accurately predict the protein AS isoform structures. As a first step towards this goal, we performed a systematic analysis on the structural changes caused by short internal indels through mining highly homologous proteins in Protein Data Bank (PDB).

Results: We compiled a non-redundant dataset of short internal indels (2-40 amino acids) from highly homologous protein pairs and analyzed the sequence and structural features of the indels. We found that about one third of indel residues are in disordered state and majority of the residues are exposed to solvent, suggesting that these indels are generally located on the surface of proteins. Though naturally occurring indels are fewer than engineered ones in the dataset, there are no statistically significant differences in terms of amino acid frequencies and secondary structure types between the "Natural" indels and "All" indels in the dataset. Structural comparisons show that all the protein pairs with short internal indels in the dataset preserve the structural folds and about 85% of protein pairs have global RMSDs (root mean square deviations) of 2A or less, suggesting that protein structures tend to be conserved and can tolerate short insertions and deletions. A few pairs with high RMSDs are results of relative domain positions of the proteins, probably due to the intrinsically dynamic nature of the proteins.

Conclusions: The analysis demonstrated that protein structures have the "plasticity" to tolerate short indels. This study can provide valuable guides in modeling protein AS isoform structures and homologous proteins with indels through placing the indels at the right locations since the accuracy of sequence alignments dictate model qualities in homology modeling.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Flowchart for indel identification and structural analysis.
Figure 2
Figure 2
An example for removing redundant protein chains in each cluster.
Figure 3
Figure 3
An example of a false indel sequence derived from 1XJIA-1C8SA. Dashed red line represents the disordered fragment. The Cα distance between the two residues (F and N) that flank the disordered fragment in 1C8SA is 8.54 Å.
Figure 4
Figure 4
Comparisons of amino acid compositions, secondary structure types and relative solvent accessibilities of indel residues in "all indels", "naturally occurring indels" and reference datasets. Relative frequencies of 20 amino acids, frequencies of secondary structure types (helix, strand, coil, and disordered), and relative solvent accessibilities (buried: ≤7%, intermediate: >7% and ≤37%, exposed: >37%) are shown in A, B and C respectively. The one-letter code for amino acids is used. "Background" data for amino acid frequencies, secondary structure types and solvent accessibilities are calculated from a dataset of 4731 non-redundant protein structures (See Methods). "Natural" represents an indel dataset without engineered indels. "All" indel dataset includes both engineered and natural indel sequences.
Figure 5
Figure 5
Structural comparisons of protein pairs with indel sequences using FAST [42]. (A) Distribution of global RMSDs; (B) relationship between global RMSDs and indel lengths.
Figure 6
Figure 6
Global and local structure alignments. (A) Structural alignment between 1Y64B and 1UX4A; (B) and (C) N-terminal and C-terminal structural alignments, respectively; and (D) structural alignment between 1Y64B and 1UX5A. Green: 1Y64B. Red: 1UX4A in A, B, and C and 1UX5A in D.
Figure 7
Figure 7
Structural comparisons of proteins with indels adopting α-helix and/or β-strand conformations. (A, B): 1RJ7A-1RJ8A; (C, D): 5PGMA-3PGMA; (E, F): 1EKXA-2ATCA. (A, C, E): whole structure alignments; (B, D, F): highlights of alignments in the indel region; (G): part of the sequence alignment between 1EKXA and 2ATCA involving the indel sequence MAEVDILY; (H): part of the structural alignment between 1EKXA and 2ATCA involving the indel sequence. Green: long protein; Red: short protein; Blue: indel sequence.
Figure 8
Figure 8
A snapshot of SCINDEL webserver. Sequence and structure alignments between 1GSAA and 1GLVA with an indel sequence of 13 amino acids.

Similar articles

Cited by

References

    1. Pennisi E. Why do humans have so few genes? Science (New York, NY) 2005;309(5731):80.. - PubMed
    1. Xing Y, Lee C. Alternative splicing and RNA selection pressure--evolutionary consequences for eukaryotic genomes. Nature reviews. 2006;7(7):499–509. doi: 10.1038/nrg1896. - DOI - PubMed
    1. Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF, Schroth GP, Burge CB. Alternative isoform regulation in human tissue transcriptomes. Nature. 2008;456(7221):470–476. doi: 10.1038/nature07509. - DOI - PMC - PubMed
    1. Tress ML, Martelli PL, Frankish A, Reeves GA, Wesselink JJ, Yeats C, Olason PI, Albrecht M, Hegyi H, Giorgetti A, Raimondo D, Lagarde J, Laskowski RA, Lopez G, Sadowski MI, Watson JD, Fariselli P, Rossi I, Nagy A, Kai W, Storling Z, Orsini M, Assenov Y, Blankenburg H, Huthmacher C, Ramirez F, Schlicker A, Denoeud F, Jones P, Kerrien S. The implications of alternative splicing in the ENCODE protein complement. Proceedings of the National Academy of Sciences of the United States of America. 2007;104(13):5495–5500. doi: 10.1073/pnas.0700800104. - DOI - PMC - PubMed
    1. UniProt-Consortium. The Universal Protein Resource (UniProt) 2009. Nucleic acids research. 2009. pp. D169–174. - PMC - PubMed

Publication types