Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Sep;84 Suppl 1(Suppl 1):20-33.
doi: 10.1002/prot.24982. Epub 2016 Jan 27.

CASP 11 target classification

Affiliations

CASP 11 target classification

Lisa N Kinch et al. Proteins. 2016 Sep.

Abstract

Protein target structures for the Critical Assessment of Structure Prediction round 11 (CASP11) and CASP ROLL were split into domains and classified into categories suitable for assessment of template-based modeling (TBM) and free modeling (FM) based on their evolutionary relatedness to existing structures classified by the Evolutionary Classification of Protein Domains (ECOD) database. First, target structures were divided into domain-based evaluation units. Target splits were based on the domain organization of available templates as well as the performance of servers on whole targets compared to split target domains. Second, evaluation units were classified into TBM and FM categories using a combination of measures that evaluate prediction quality and template detectability. Generally, target domains with sequence-related templates and good server prediction performance were classified as TBM, whereas targets without sequence-identifiable templates and low server performance were classified as FM. As in previous CASP experiments, the boundaries for classification were blurred due to the presence of significant insertions and deteriorations in the targets with respect to homologous templates, as well as the presence of templates with partial coverage of new folds. The FM category included 45 target domains, which represents an unprecedented number of difficult CASP targets provided for modeling. Proteins 2016; 84(Suppl 1):20-33. © 2016 Wiley Periodicals, Inc.

Keywords: CASP11; classification; fold space; free modeling; protein structure; sequence homologs; structure analogs; structure prediction; template-based modeling.

PubMed Disclaimer

Figures

Figure 1
Figure 1. CASP11 Target domain splits
The procedure for splitting targets into domains for evaluation is illustrated using examples. A) Target T0759 was split into an N-terminal (blue) and a C-terminal (salmon/red) domain, based on the presence of separate hydrophobic cores. The domains are sequence-detected repeating units, with the C-terminal core of the repeat (red) being elaborated by additional secondary structures (salmon). B) The closest homologous template to the T0759 core N-terminal domain (1lm5, blue)differs from the closest template analog to the T0759 C-terminal elaborated domain duplication (3cwx, red). C) The Grishin plot performance comparison for T0759 suggests splitting the domains into two evaluation units based on the increased scores of split domains. D) Target T0786 was split into an N-terminal (blue) domain and a C-terminal (red) domain based on an internal fold duplication. E) The closest template has the same domain duplication (2q4h, blue and red) arranged similarly as target T0786. F) The Grishin plot slope close to 1 suggests the split is not necessary for target T0786 evaluation.
Figure 2
Figure 2. Difficult domain splits in obligate oligomers
A) The N-terminal (blue) and C-terminal (red) domains in Target T0820 adopt an obligate dimer with a second chain (white) through a C-terminal domain swap. The phage tail proteins form obligate trimers through alternating integrated b-strands and meandering b-strands B) in target T0799, with three alternating trimerization domains colored inblue, green and yellow, followed by a chaperone domain in red; and C) in target T0775, with six defined alternating domains in rainbow.
Figure 3
Figure 3. Difficult evolutionary target classifications
Similar structural elements between target and template are colored in rainbow, with insertions in gray. A) The target T0824-D1 retains a similar placement of the active site (black sidechains) and unusual structure feature (magenta), yet has a significant deterioration of the fold present in B) the closest template (1g8t), active site marked by black sphere. C) The target T0832 retains a similar placement of the active site (black sidechains), yet has a significant deterioration of the fold present in D) the closest template (1dmw). E) A relatively well-predicted new fold of the swapped domain in target T0820-D2 (rainbow) can be modeled over a significant portion of the fold by F) a partially detected domain template (2f23).
Figure 4
Figure 4. Evolutionary classification of CASP11 targets using ECOD
A) Targets are distributed into ECOD hierarchy, with 42 classified at the family level with closely related structures, 50 classified at the H-group level with more distantly related structure homologs, 28 classified at the X-group level with structures having similar topology, but questionable homology, and 6 targets classified as new folds. B) The distribution of targets into ECOD architectures shows relatively equal distribution among traditionally categorized classes of all-α (highlighted pink), all-β (highlighted lavender), α/β (highlighted light blue) and α+β (highlighted light green). C). Some ECOD architectures are overrepresented in CASP11 targets (blue bars) and some are underrepresented in CASP11 targets (red bars) as compared to all ECOD classified PDB structures.
Figure 5
Figure 5. CASP11 target domain score distributions
Five scores reflecting prediction quality (average GDT_TS scores of server models, average GDT_TS score of first server models above random, and number of first server models above random) and template distance (LGA_S to chosen template and HHPRED probability to homologous template) were combined as Z-score sums. A) A distribution of Z-score sum frequencies highlights the distinction (around 2.25) between confidently assigned FM (red bars) and TBM domains (green bars), with unknown domains distributed in the middle (yellow bars). B) A scatter plot of the Z-score sum vs. the template LGA_S is colored as above and highlights the final categorization into FM (empty triangles) and TBM (filled squares). An automatically defined categorization boundary using SVM with linear kernel (dashed line) differs slightly from that defined using logic regression (solid line). Target domains that blur the boundaries of categorization are labeled. C) A scatter plot of CASP ROLL targets overlapping with CASP11 FM (open markers) and targets unique to CASP ROLL (filled markers) illustrates categorization into FM (red triangles) and TBM (green squares) based on Z-score combination of measures (top first model GDT_TS, LGA_S to template, and HHPred Probability).
Figure 6
Figure 6. New Folds
New folds are depicted in cartoon and colored in rainbow from the N-terminus to the C-terminus: A) complex alpha T0777-D1, and B) T0827D2, C) alpha obligate multimer T0820-D1, D) alpha bundle T0826-D1, E) few SSEs T0793-D2, and F)α+β two layer T0855-D1.
Figure 7
Figure 7. Oligomeric interactions
A) Hexameric complex contains three T0787 (salmon cartoon) and three T0788 (slate cartoon) protein subunits, formed by a trimer of B) T0787/T0788 heterodimers. C) T0797 (salmon cartoon) and T0798 (slate cartoon) form a dimer of heterodimers in the crystal unit, D) forming the functional T0797 leucine zipper requires considering crystal contacts. E) T0840 (green cartoon) forms a one to one complex with T0841 (cyan cartoon). F) T0825 has two identical chains (cyan and green cartoon) that adopt alternate conformations (magenta) to dimerize into a complete β-propeller.

References

    1. Cheng H, Schaeffer RD, Liao Y, Kinch LN, Pei J, Shi S, Kim BH, Grishin NV. ECOD: an evolutionary classification of protein domains. PLoS Comput Biol. 2014;10(12):e1003926. - PMC - PubMed
    1. Bork P. Shuffled domains in extracellular proteins. FEBS letters. 1991;286(1–2):47–54. - PubMed
    1. Richardson JS. The anatomy and taxonomy of protein structure. Advances in protein chemistry. 1981;34:167–339. - PubMed
    1. Wetlaufer DB. Nucleation, rapid folding, and globular intrachain regions in proteins. Proceedings of the National Academy of Sciences of the United States of America. 1973;70(3):697–701. - PMC - PubMed
    1. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25(17):3389–3402. - PMC - PubMed

Publication types