Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2015 Jul;24(7):1075-86.
doi: 10.1002/pro.2689. Epub 2015 Jun 16.

ChSeq: A database of chameleon sequences

Affiliations
Review

ChSeq: A database of chameleon sequences

Wenlin Li et al. Protein Sci. 2015 Jul.

Abstract

Chameleon sequences (ChSeqs) refer to sequence strings of identical amino acids that can adopt different conformations in protein structures. Researchers have detected and studied ChSeqs to understand the interplay between local and global interactions in protein structure formation. The different secondary structures adopted by one ChSeq challenge sequence-based secondary structure predictors. With increasing numbers of available Protein Data Bank structures, we here identify a large set of ChSeqs ranging from 6 to 10 residues in length. The homologous ChSeqs discovered highlight the structural plasticity involved in biological function. When compared with previous studies, the set of unrelated ChSeqs found represents an about 20-fold increase in the number of detected sequences, as well as an increase in the longest ChSeq length from 8 to 10 residues. We applied secondary structure predictors on our ChSeqs and found that methods based on a sequence profile outperformed methods based on a single sequence. For the unrelated ChSeqs, the evolutionary information provided by the sequence profile typically allows successful prediction of the prevailing secondary structure adopted in each protein family. Our dataset will facilitate future studies of ChSeqs, as well as interpretations of the interplay between local and nonlocal interactions. A user-friendly web interface for this ChSeq database is available at prodata.swmed.edu/chseq.

Keywords: ChSeq; biological function; chameleon sequence; conformational change; secondary structure; secondary structure prediction; sequence profile; structural plasticity.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Chameleon sequences (ChSeqs) and their distributions in homologous and unrelated proteins. A ChSeq adopting different conformations. The pdb codes are 2Q0Y (left) and 3S30 (right), respectively. ChSeqs are colored magenta in both the structure and sequence.
Figure 2
Figure 2
Conformational changes in Type I fusion protein of respiratory syncytial virus. (a) ChSeqs (colored magenta) between residues 185–194 and 176–181 (pdb: 4jhw, Chain F) form a β-hairpin in the prefusion complex (pdb: 4jhw) illustrated in rainbow as monomeric (left panel) and trimeric (right panel). (b) The ChSeqs form helical conformations in the profusion complex (pdb: 3rki, Chain A) illustrated as above. (c) The sequence and the corresponding secondary structures of the ChSeq segments in prefusion (Line 2: 4jhw) and profusion (Line 3: 3rki) complexes.
Figure 3
Figure 3
ChSeqs in proteins of different lengths. The region of identical sequences is shown in the alignment and colored rainbow in the structures. ChSeqs are colored magenta.
Figure 4
Figure 4
Example of a 10-residue ChSeq in unrelated proteins. (a) ChSeqs (magenta) in the structures 4JB9 (left) and 1VL6 (right). (b) Close-ups of red box regions of panel (a) with some backbone hydrogen bonds (dashed yellow lines) shown. (c) Sequence, observed secondary structure, and psiS- and psiP-predicted secondary structure are shown along with weblogo pictures visualizing the sequence profiles in each protein family.
Figure 5
Figure 5
Amino acid composition of ChSeqs. Amino acid frequencies in ChSeqs (blue) are compared with the frequencies seen in proteins from the Swiss-Prot database (green).
Figure 6
Figure 6
ChSeqs are similarly buried as residues in strands and helices. Histogram of the RSA distribution of residues in “stringent” ChSeqs (red), in a set of 1000 random proteins (blue), and in a set of “random” β-strands and α-helices (green).
Figure 7
Figure 7
Histograms of prediction P-values (PPVs) for ChSeqs with (a) incorrect psiS predictions and (b) correct psiS predictions. Green lines represent the PPVs for controls computed from a random sequence from the family.
Figure 8
Figure 8
Histograms of PPVs for ChSeqs with helical (red) and stranded (blue) conformations. All studied ChSeqs (a) are further divided into those with correct psiS predictions (b) and incorrect predictions (c).
Figure 9
Figure 9
Nonhomologous ChSeq in homologous proteins. The ChSeq (purple) is highlighted in the two ribbon diagrams, and the BLAST alignment is shown.
Figure 10
Figure 10
An example web interface. This shows a ChSeq that occurs in unrelated proteins (accessible at http://prodata.swmed.edu/wenlin/pdb_survey2/index.cgi/new_dssp/middle-match/RVYGAQNEMC/).

References

    1. Ballew RM, Sabelko J, Gruebele M. Direct observation of fast protein folding: the initial collapse of apomyoglobin. Proc Natl Acad Sci USA. 1996;93:5759–5764. - PMC - PubMed
    1. Freund SM, Wong KB, Fersht AR. Initiation sites of protein folding by NMR analysis. Proc Natl Acad Sci USA. 1996;93:10600–10603. - PMC - PubMed
    1. Han KF, Baker D. Global properties of the mapping between local amino acid sequence and local structure in proteins. Proc Natl Acad Sci USA. 1996;93:5814–5818. - PMC - PubMed
    1. Socci ND, Onuchic JN, Wolynes PG. Protein folding mechanisms and the multidimensional folding funnel. Proteins. 1998;32:136–158. - PubMed
    1. Dill KA. Polymer principles and protein folding. Protein Sci. 1999;8:1166–1180. - PMC - PubMed

Publication types

Associated data

LinkOut - more resources