Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2002 Dec 10;99(25):16041-6.
doi: 10.1073/pnas.252626399. Epub 2002 Dec 2.

The directional atomic solvation energy: an atom-based potential for the assignment of protein sequences to known folds

Affiliations
Comparative Study

The directional atomic solvation energy: an atom-based potential for the assignment of protein sequences to known folds

Parag Mallick et al. Proc Natl Acad Sci U S A. .

Abstract

The Directional Atomic Solvation EnergY (DASEY) is an atom-based description of the environment of an amino acid position within a known 3D protein structure. The DASEY has been developed to align and score a probe amino acid sequence to a library of template protein structures for fold assignment. DASEY is computed by summing the atomic solvation parameters of atoms falling within a tetrahedral sector, or petal, extending 16 A along each of the four bond axes of each alpha-carbon atom of the protein. The DASEY discriminates between pairs of structurally equivalent positions and random pairs in protein structures sharing a fold but belonging to different superfamilies, unlike some previous descriptors of protein environments, such as buried area. Furthermore, the DASEY values have characteristic patterns of residue replacement, an essential feature of a successful fold assignment method. Benchmarking fold assignment with DASEY achieves coverage of 56% of sequences with 90% accuracy when probe sequences are matched to protein structural templates belonging to the same fold but to a different superfamily, an improvement of greater than 200% over a previous method.

PubMed Disclaimer

Figures

Fig 1.
Fig 1.
Fold assignment relies on scoring the compatibility of a probe sequence (Top Left) with a known 3D structure (Middle Right). Also available for scoring are the sequence and secondary structure of the protein of known structure (Top Right) and features predicted from the probe sequence (Top Left). The compatibility can be scored by any of the type of functions listed in the Left Middle box. Our method scores the compatibility with DASEY. The DASEY (Inset Bottom Right) describes the hydrophobicity of the environment of a Cα position in a known protein structure. Each of the four dimensions of the DASEY is calculated by summing the Atomic Solvation Parameters (ASPs) of the atoms contained within a tetrahedral sector, or petal, that extends 16 Å along a bond direction from the α-carbon atom of the position. The distance from the bond axis weights down the contribution of each ASP. The four tetrahedral petals are shown (Inset). Two atoms are shown in the petal along the Cα→N direction. The N atom has a negative (polar) ASP; the C atom has a positive (apolar) ASP.
Fig 2.
Fig 2.
DASEY is used to align and score a probe sequence with a known template structure. This is achieved with PDFs of the form P(resformula image,ssformula image,DASEYformula image,resj,ssformula image) (Eq. 5), that give the likelihood of aligning residue type (res) j of the probe to structure position i of the template. This PDF depends on DASEY, res, and secondary structure type (ss). In the example of a DAPS aligned residue pair shown in Step 2, position i of structure I, which happens to be occupied by a valine in a sheet (E) with a DASEY of [30, 56, 25, 48] and PHD (35) predicted secondary structure of sheet (E) is aligned to position j of structure J, an arginine in a coil (C) with a DASEY of [32, 53, 20, 41] and PHD-predicted secondary structure of helix (H). The DASEY of [32, 53, 20, 41] means that the weighted sum of atomic solvation parameters for atoms within the Cα→N petal is 32, and so forth. Notice that the DASEY vectors from the aligned residue pair contribute to two bins. We place the DASEY vector [30, 56, 25, 48] from position i into the first bin, which we denote “VE RH.” This means this DASEY represents a valine strand aligned to an arginine-predicted helix. Next we place the DASEY vector [32, 53, 20, 41] into a second bin, which we denote “RC VE.” Appropriately binned DASEY vectors from the training set of aligned known structures give density plots as shown on the left in Step 4. The distribution of DASEY values in each of the 3,600 bins (one bin for each combination of residue, predicted secondary structure, aligned residue, aligned secondary structure) is modeled as a mixture of multivariate normal PDFs, as shown by the equation on the right in Step 4. These density functions are the dominant terms of the emission score and describe the likelihood of aligning a residue and predicted secondary structure from a probe sequence of unknown structure with a residue, secondary structure, and DASEY from a template structure.
Fig 3.
Fig 3.
The four dimensions of the DASEY are distributed differently for different probe-to-template position alignments and discriminate structurally similar pairs of positions from random pairs in the test set of 864,758 pairs of aligned residues in similarly folded proteins of different superfamilies. A and B demonstrate the capacity of DASEY to discriminate among different residues for different structural positions in the test set of 864,758 pairs of aligned residues in similarly folded proteins of different superfamilies. A and B show this capacity for one of the four petals. In fact, all four petals contribute to the discrimination and together constitute the environment of a position. The five curves show the different densities P for different residue-secondary structure combinations. Notice the clearly different distributions. C–F plot the value of an environmental descriptor for one residue position of a protein on the x axis, and the descriptor for a structurally equivalent position from a similarly folded protein from a different superfamily on the y axis to demonstrate the degree of conservation of an environmental descriptor; a well-conserved parameter yields a tight line along the diagonal. Note that the DASEY dimensions are better conserved than either Fraction Polar or Area Buried of ref. , which are poorly conserved.
Fig 4.
Fig 4.
BENCHMARK 1 assesses the accuracy and coverage of the 3-fold assignment methods, DASEY, SDP (6), and SDP+. For the 1,000 protein sequences of the probe set and the 3,914 structures of the template library, the fraction of correct fold assignments is given. The coverage or probability of detection describes the method's sensitivity or true positive fraction. The probability of false alarm describes the method's false positive (error) fraction, (1-accuracy). These two quantities are plotted on a Receiver Operator Characteristic plot (48). Perfect performance on a Receiver Operator Characteristic plot is a horizontal line along the top of the graph representing 100% coverage and 0% error. The DASEY method has nearly double the coverage of the SDP methods at low error rates. The vertical bars refer to thresholds in Table 1.

Similar articles

Cited by

References

    1. Benson D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., Rapp, B. A. & Wheeler, D. L. (2002) Nucleic Acids Res. 30, 17-20. - PMC - PubMed
    1. Berman H. M., Battistuz, T., Bhat, T. N., Bluhm, W. F., Bourne, P. E., Burkhardt, K., Feng, Z., Gilliland, G. L., Iype, L., Jain, S., et al. (2002) Acta Crystallogr. D 58, 899-907. - PubMed
    1. Bowie J. U., Lüthy, R. & Eisenberg, D. (1991) Science 253, 164-170. - PubMed
    1. Bryant S. H. & Lawrence, C. E. (1993) Proteins 16, 92-112. - PubMed
    1. Defay T. R. & Cohen, F. E. (1996) J. Mol. Biol. 262, 314-323. - PubMed

Publication types