Comparative Study

. 2014 Sep 18:7:654.

doi: 10.1186/1756-0500-7-654.

Implication of the cause of differences in 3D structures of proteins with high sequence identity based on analyses of amino acid sequences and 3D structures

Masanari Matsuoka, Masatake Sugita, Takeshi Kikuchi¹

Affiliations

PMID: 25231773
PMCID: PMC4180342
DOI: 10.1186/1756-0500-7-654

Comparative Study

Implication of the cause of differences in 3D structures of proteins with high sequence identity based on analyses of amino acid sequences and 3D structures

Masanari Matsuoka et al. BMC Res Notes. 2014.

. 2014 Sep 18:7:654.

doi: 10.1186/1756-0500-7-654.

Authors

Masanari Matsuoka, Masatake Sugita, Takeshi Kikuchi¹

Affiliation

¹ Department of Bioinformatics, College of Life Sciences, Ritsumeikan University, 1-1-1 Nojihigashi, Kusatsu, Shiga, Japan. tkikuchi@sk.ritsumei.ac.jp.

PMID: 25231773
PMCID: PMC4180342
DOI: 10.1186/1756-0500-7-654

Abstract

Background: Proteins that share a high sequence homology while exhibiting drastically different 3D structures are investigated in this study. Recently, artificial proteins related to the sequences of the GA and IgG binding GB domains of human serum albumin have been designed. These artificial proteins, referred to as GA and GB, share 98% amino acid sequence identity but exhibit different 3D structures, namely, a 3α bundle versus a 4β + α structure. Discriminating between their 3D structures based on their amino acid sequences is a very difficult problem. In the present work, in addition to using bioinformatics techniques, an analysis based on inter-residue average distance statistics is used to address this problem.

Results: It was hard to distinguish which structure a given sequence would take only with the results of ordinary analyses like BLAST and conservation analyses. However, in addition to these analyses, with the analysis based on the inter-residue average distance statistics and our sequence tendency analysis, we could infer which part would play an important role in its structural formation.

Conclusions: The results suggest possible determinants of the different 3D structures for sequences with high sequence identity. The possibility of discriminating between the 3D structures based on the given sequences is also discussed.

PubMed Disclaimer

Figures

**Figure 1**
**Ribbon representations of the 3D structures of 2LHC (a), 2LHG (b), 2LHD (c) and 2LHE (d) with their amino acid sequences.** A segment with dark gray denotes an α helix and one with light gray denotes a β strand. The meanings of the symbols **"a"** and **"b"** are same as that in Figure 11. The differences between the sequences are highlighted and shown in stick in the figure above.

**Figure 2**
**Plots of F(μ) values for 2LHC and 2LHD with error bars.** An arrow with a numeral denotes the location of a peak of a plot. The numeral indicates the position of the residue at a peak. A black numeral, an underlined black numeral and an outlined numeral mean a peak observed in both 2LHC (GA98-1) and 2LHD (GB98-1), a peak observed in only 2LHC, and a peak observed in only 2LHD, respectively.

**Figure 3**
**Plots of F(μ) values for 2LHG and 2LHE with error bars.** An arrow with a numeral denotes the location of a peak of a plot. The numeral indicates the position of the residue at a peak. A black numeral, an underlined black numeral and an outlined numeral mean a peak observed in both 2LHG (GA98-2) and 2LHE (GB98-2), a peak observed in only 2LHG and a peak observed in only 2LHE, respectively.

**Figure 4**
**Visuallization of hydrophobic packings. (a)** The packing hydrophobic residues formed by residues near the peaks of the F-value plot for 2LHC (GA98-1). The packing residues are 16-A, 20-L, 30-F, 33-I, 42-V and 45-L. **(b)** The hydrophobic contacts formed by residues near the peaks of the F-value plot for 2LHD (GB98-1). The pairwise contacts are formed by 16-Ala and 30-Phe, 20-Leu and 26-Ala as well as by 34-Ala and 43-Trp. 35-Asn, which forms a contact with 43-Trp in Gō model simulations, is indicated by light gray. **(c)** The packing hydrophobic residues formed by residues near the peaks of the F-value plot for 2LHG (GA98-2). The packing residues are 16-A, 20-L,25-I, 33-I, 42-V and 45YL. **(d)** The hydrophobic contacts formed by residues near the peaks of the F-value plot for 2LHE (GB98-2). The pairwise contacts are formed by 16-Ala and 30-Phe, 20-Leu and 25-Ile as well as by 34-Ala and 43-Trp.

**Figure 5**
**Sequence tendencies. (a)** The solid or dashed line corresponds to [PDB:2LHC] (GA98-1) or [PDB:2LHD] (GB98-1), respectively. **(b)** The solid or dashed line corresponds to [PDB:2LHG] (GA98-2) or [PDB:2LHE] (GB98-2), respectively. While the x-axis denotes residue number, the y-axis denotes the relative similarity to 2FS1/1PGA. Positive or negative large value means the local sequence has high similarity to [PDB:2FS1] (GA) or [PDB:1PGA] (GB), respectively. The bold numbers represent the residue numbers at peaks or valleys of the plot. How we get the sequence tendencies is described in the Material and Methods section.

**Figure 6**
**Distribution of conserved hydrophobic residues with local sequence tendencies.** The solid or dashed line corresponds to the sequence tendency of GA98-1 or GB98-1, respectively. The squares above the sequence tendency plot denote the conserved hydrophobic residues of 2FS1 and its homologues (shown in Figure 8), while the triangles below the tendency plot denote these of 1PGA (shown in Figure 9).

**Figure 7**
**Multiple alignment of 2LHC, 2LHD, 2LHG and 2LHE.** The sites beside those marked by arrows are perfectly conserved.

**Figure 8**
**Multiple alignment of sequences of 2FS1 and related proteins hit by BLAST.** A site marked by "*" is perfectly conserved and that marked by "+" is 85% conserved.

**Figure 9**
**Multiple alignment of sequences of 1PGA and related proteins hit by BLAST.** A site marked by "*" is perfectly conserved and that marked by "+" is 85% conserved.

**Figure 10**
**Contact map constructed from the conformation ensemble simulated by the present Gō model at around the transition state of folding. (a)** 2LHC, **(b)** 2LHD, **(c)** 2LHG, and **(d)** 2LHE. A darker spot indicates the high occurrence of conformations with a contact corresponding to the spot. The numbers, for example "80-100", on the right side of the figure are the number of conformations during a simulation near the transition state. A black bar denotes the location of an α helix, and a black arrow indicates the location of a β strand.

**Figure 11**
**Ribbon representations of the 3D structures of 2FS1 (a) and 1PGA(b) with their amino acid sequences.** A segment with dark gray denotes an α helix and one with light gray denotes a β strand. A residue with the symbol "a" takes the α-helix conformation and one with "b" takes a β-strand conformation. The definition of the secondary structures in the PDB is used in this study. All identical residues between the two sequences are highlighted.

**Figure 12**
**Cα-bead model of a protein used in this study.** The bond length is fixed as 3.8 Å, and the bond and dihedral angle are symbolized as θ and ϕ, respectively.

See this image and copyright information in PMC

Cited by

Conserved structural topologies in RNase A-like and trypsin-like serine proteases: a sequence-based folding analysis.
Kabir KMA, Takahashi T, Kikuchi T. Kabir KMA, et al. BMC Mol Cell Biol. 2025 May 28;26(1):16. doi: 10.1186/s12860-025-00542-y. BMC Mol Cell Biol. 2025. PMID: 40437407 Free PMC article.
Analyses of the folding sites of irregular β-trefoil fold proteins through sequence-based techniques and Gō-model simulations.
Kimura R, Aumpuchin P, Hamaue S, Shimomura T, Kikuchi T. Kimura R, et al. BMC Mol Cell Biol. 2020 Apr 15;21(1):28. doi: 10.1186/s12860-020-00271-4. BMC Mol Cell Biol. 2020. PMID: 32295515 Free PMC article.

References

1. Rost B. Twilight zone of protein sequence alignments. Protein Eng. 1999;12:85–94. doi: 10.1093/protein/12.2.85. - DOI - PubMed
1. He Y, Rozak DA, Sari N, Chen Y, Bryan P, Orban J. Structure, dynamics, and stability variation in bacterial albumin binding modules: implications for species specificity. Biochemistry. 2006;45:10102–10109. doi: 10.1021/bi060409m. - DOI - PubMed
1. He Y, Chen Y, Alexander PA, Bryan PN, Orban J. Mutational tipping points for switching protein folds and functions. Structure. 2012;20:283–291. doi: 10.1016/j.str.2011.11.018. - DOI - PMC - PubMed
1. Shen Y, Bryan PN, He Y, Orban J, Baker D, Bax A. De novo structure generation using chemical shifts for proteins with high-sequence identity but different folds. Protein Sci. 2010;19:349–356. doi: 10.1002/pro.303. - DOI - PMC - PubMed
1. Allison JR, Bergeler M, Hansen N, van Gunsteren WF. Current computer modeling cannot explain Why Two highly similar sequences fold into different structures. Biochemistry. 2011;50:10965–10973. doi: 10.1021/bi2015663. - DOI - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Implication of the cause of differences in 3D structures of proteins with high sequence identity based on analyses of amino acid sequences and 3D structures

Affiliation

Implication of the cause of differences in 3D structures of proteins with high sequence identity based on analyses of amino acid sequences and 3D structures

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials

Miscellaneous