Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jan 7;50(D1):D553-D559.
doi: 10.1093/nar/gkab1054.

SCOPe: improvements to the structural classification of proteins - extended database to facilitate variant interpretation and machine learning

Affiliations

SCOPe: improvements to the structural classification of proteins - extended database to facilitate variant interpretation and machine learning

John-Marc Chandonia et al. Nucleic Acids Res. .

Abstract

The Structural Classification of Proteins-extended (SCOPe, https://scop.berkeley.edu) knowledgebase aims to provide an accurate, detailed, and comprehensive description of the structural and evolutionary relationships amongst the majority of proteins of known structure, along with resources for analyzing the protein structures and their sequences. Structures from the PDB are divided into domains and classified using a combination of manual curation and highly precise automated methods. In the current release of SCOPe, 2.08, we have developed search and display tools for analysis of genetic variants we mapped to structures classified in SCOPe. In order to improve the utility of SCOPe to automated methods such as deep learning classifiers that rely on multiple alignment of sequences of homologous proteins, we have introduced new machine-parseable annotations that indicate aberrant structures as well as domains that are distinguished by a smaller repeat unit. We also classified structures from 74 of the largest Pfam families not previously classified in SCOPe, and we improved our algorithm to remove N- and C-terminal cloning, expression and purification sequences from SCOPe domains. SCOPe 2.08-stable classifies 106 976 PDB entries (about 60% of PDB entries).

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Variant search results. The variant search result page displays an interactive viewer showing the structural context of the variant and relevant evolutionary context from the SCOPe hierarchy, including members of the same family or superfamily as the impacted protein domain. In the example shown, the user searched for a missense variant in chromosome 2, which affects the coding sequence of the ZAP-70 protein. The variant viewer displays the most relevant human ZAP-70 structure classified in SCOPe. Note that in this structure, the amino acid residue affected by the variant is located in a structurally uncharacterized loop in the protein, so the nearest residues in the structure are highlighted. Several additional ZAP-70 structures are also shown, allowing users to visualize the impact of the variant in different structural contexts.
Figure 2.
Figure 2.
Examples of structurally heterogeneous families. (A) The three domains belong to family b.11.1.1: Crystallins/Ca-binding development proteins. Most entries in this family are divided into two 8-beta-strand domains, but a3 remains undivided. In this example, we label a3 as having multiple alternative domain divisions. (B) The three domains belong to family d.26.1.1: FKBP immunophilin/proline isomerase. Shown in b1 is the common family domain; shown in b2 is the common family domain followed by an N-terminal all-alpha subdomain; shown in b3 is the common family domain, with additional alpha-helices at both termini. (C) The three domains belong to family b.1.2.1: Fibronectin type III. Shown in c1 is the family-specific domain with 7 beta strands in 2 sheets; shown in c2 is a fragment which is missing more than one third of the beta strands presented in the common fold; shown in c3 is a domain missing one beta strand from the common fold. (D) The two domains belong to family g.24.1.1: TNF receptor-like, which is defined by a specific pattern of disulfide bonds. We have shown disulfide bonds as red sticks and observed there are different numbers of these bonds in each domain. Domains with more disulfide bonds (such as d2) will be labeled as having additional elements; likewise, domains with fewer disulfide bonds will be labeled as missing some elements. (E) The three domains belong to family d.58.1.5: Ferredoxin domains from multidomain proteins. As the family name suggests, we can see that domains in this family may contain different numbers of (sub)domains. (F) The two domains belong to family a.4.1.1: Homeodomain. Shown in f1 is the common domain, which has three alpha helices arranged into a triangle-like structure; shown in f2 is a domain with similar secondary structures and sequence as f1 but folded differently. (G) The three domains belong to different families under the same fold a.137: Non-globular all-alpha subunits of globular proteins, which is deemed ‘not a true fold’ in the comments by SCOP(e) curators. We include this label as a category here in order to make it accessible to automated methods. (H) The three domains belong to family a.298.1.1: TAL (transcription activator-like) effector. Shown in h1 is a single repeat unit; h2 and h3 contain different numbers of this same repeat unit.

References

    1. Murzin A.G., Brenner S.E., Hubbard T., Chothia C.. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 1995; 247:536–540. - PubMed
    1. Lo Conte L., Brenner S.E., Hubbard T.J.P., Chothia C., Murzin A.G.. SCOP database in 2002: refinements accommodate structural genomics. Nucleic Acids Res. 2002; 30:264–267. - PMC - PubMed
    1. Andreeva A., Howorth D., Brenner S.E., Hubbard T.J.P., Chothia C., Murzin A.G.. SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res. 2004; 32:D226–D229. - PMC - PubMed
    1. Andreeva A., Howorth D., Chandonia J.-M., Brenner S.E., Hubbard T.J.P., Chothia C., Murzin A.G.. Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res. 2008; 36:D419–D44425. - PMC - PubMed
    1. Fox N.K., Brenner S.E., Chandonia J.-M.. SCOPe: Structural Classification of Proteins–extended, integrating SCOP and ASTRAL data and classification of new structures. Nucleic Acids Res. 2014; 42:D304–D309. - PMC - PubMed

Publication types