Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Nov 6;26(11):1546-1554.e2.
doi: 10.1016/j.str.2018.08.011. Epub 2018 Oct 4.

Automatic Inference of Sequence from Low-Resolution Crystallographic Data

Affiliations

Automatic Inference of Sequence from Low-Resolution Crystallographic Data

Ziv Ben-Aharon et al. Structure. .

Abstract

At resolutions worse than 3.5 Å, the electron density is weak or nonexistent at the locations of the side chains. Consequently, the assignment of the protein sequences to their correct positions along the backbone is a difficult problem. In this work, we propose a fully automated computational approach to assign sequence at low resolution. It is based on our surprising observation that standard reciprocal-space indicators, such as the initial unrefined R value, are sensitive enough to detect an erroneous sequence assignment of even a single backbone position. Our approach correctly determines the amino acid type for 15%, 13%, and 9% of the backbone positions in crystallographic datasets with resolutions of 4.0 Å, 4.5 Å, and 5.0 Å, respectively. We implement these findings in an application for threading a sequence onto a backbone structure. For the three resolution ranges, the application threads 83%, 81%, and 64% of the sequences exactly as in the deposited PDB structures.

Keywords: automatic threading; low-resolution crystallography; model building; reciprocal-space indicators.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Electron density maps at 4.0A resolution reveal very little information on the side-chain identity even for a well-ordered helical backbone. Our application can identify the correct side-chain based on a scoring scheme that relies on correlation to the structure factors in reciprocal space. (A) A backbone region from PDB entry 4LTP with the side-chain of Tyr 162 shown in green and its predicted rotamer by SCWRL4 in orange. Out of the twenty amino acids, our application assigns the highest score in that position to a tyrosine. (B) A similar example from PDB entry 3T51 shows the side-chain of Met 696. For this position our application also correctly assigns the highest score to a methionine. (C) When a lysine is assigned incorrectly at the same position as in B the score is poor in spite of the general structural similarity between the side chains of lysine and methionine.
Figure 2.
Figure 2.
Histograms of reciprocal-space correlation coefficient values (CC-Values) for all backbone positions in the 3.8–4.0A dataset. For a given amino acid type: Match (black bars) shows values for positions where the modeled amino acid was the same as that reported in the PDB entry. Mismatch (white bar) shows all other position. Higher CC-Values indicate a better fit between a model and the crystallographic structure factors. The histograms for Tryptophan, Methionine, and Serine amino- acids are shown as typical examples for strong, moderate, and negligible CC-Value signal, respectively.
Figure 3.
Figure 3.
An example of successful automatic threading of the sequence of RBX1 Ubiquitin-Ligase onto chain D of the PDB entry 4a0c (resolution 3.8A). The gaps (‘-’) in the STRUCTURE rows indicate regions of the sequence that are unstructured in the crystallographic structure. Note that the alignment is successful at the edges of the unstructured regions.
Figure 4.
Figure 4.
The threading scores of each chain in the 3.8–4.0A resolution range (circles, 331 chains) are proportional to the chain lengths. Full red circles mark chains for which less than 85% of the residues were threaded as in the PDB deposited structures. These chains are either very short or show particularly weak CC-Value signals. The line has a slope of 0.25 Score Units per position and 72% of the chains are above the line. All red circles are below this line except for one case. The line exemplifies our definition of the Threading Quality Index as Threading Score / Chain Length.
Figure 5.
Figure 5.
The similarity (in percent of the chain length) between the result of our threading algorithm and the threading reported in the PDB structures. This similarity is plotted for each chain as a function of the Threading Quality Index. When the Threading Quality Index is above 0.25 (dashed vertical line), the fit to the PDB threading is higher than 85% (in all except one case) and in most cases it is 100%.
Figure 6.
Figure 6.
Workflow that finds the sequence assignment to a backbone with N structural positions.

Similar articles

Cited by

References

    1. Aller SG, Yu J, Ward A, Weng Y, Chittaboina S, Zhuo R, Harrell PM, Trinh YT, Zhang Q, Urbatsch IL, Chang G. (2009). Structure of P-glycoprotein reveals a molecular basis for poly-specific drug binding. Science 323, 1718–1722. - PMC - PubMed
    1. Bae B, Davis E, Brown D, Campbell EA, Wigneshweraraj S, Darst SA. (2013) Phage T7 Gp2 inhibition of Escherichia coli RNA polymerase involves misappropriation of σ70 domain 1.1. Proc Natl Acad Sci USA 110, 19772–19777. - PMC - PubMed
    1. Bohn S, Beck F, Sakata E, Walzthoeni T, Beck M, Aebersold R, Forster F, Baumeister W, Nickell S. (2010). Structure of the 26S proteasome from Schizosaccharomyces pombe at subnanometer resolution. Proc Natl Acad Sci USA 107, 20992–20997. - PMC - PubMed
    1. Cohen SX, Morris RJ, Fernandez FJ, Ben Jelloul M, Kakaris M, Parthasarathy V, Lamzin VS, Kleywegt GJ, Perrakis A. (2004). Towards complete validated models in the next generation of ARP/wARP. Acta Crystallogr D. 60, 2222–2229. - PubMed
    1. Cowtan K (2008). Fitting molecular fragments into electron density. Acta Crystallogr D. 64, 83–89. - PMC - PubMed

Publication types

LinkOut - more resources