Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Nov 25:9:492.
doi: 10.1186/1471-2105-9-492.

Structural descriptor database: a new tool for sequence-based functional site prediction

Affiliations

Structural descriptor database: a new tool for sequence-based functional site prediction

Juliana S Bernardes et al. BMC Bioinformatics. .

Abstract

Background: The Structural Descriptor Database (SDDB) is a web-based tool that predicts the function of proteins and functional site positions based on the structural properties of related protein families. Structural alignments and functional residues of a known protein set (defined as the training set) are used to build special Hidden Markov Models (HMM) called HMM descriptors. SDDB uses previously calculated and stored HMM descriptors for predicting active sites, binding residues, and protein function. The database integrates biologically relevant data filtered from several databases such as PDB, PDBSUM, CSA and SCOP. It accepts queries in fasta format and predicts functional residue positions, protein-ligand interactions, and protein function, based on the SCOP database.

Results: To assess the SDDB performance, we used different data sets. The Trypsion-like Serine protease data set assessed how well SDDB predicts functional sites when curated data is available. The SCOP family data set was used to analyze SDDB performance by using training data extracted from PDBSUM (binding sites) and from CSA (active sites). The ATP-binding experiment was used to compare our approach with the most current method. For all evaluations, significant improvements were obtained with SDDB.

Conclusion: SDDB performed better when trusty training data was available. SDDB worked better in predicting active sites rather than binding sites because the former are more conserved than the latter. Nevertheless, by using our prediction method we obtained results with precision above 70%.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Comparison of Trypsion-like Serine protease and SCOP descriptors through precision-recall curves. SDDB performance for all data sets, as measured by precision-recall curves. Each point in the graphic corresponds to a different e-value cutoff. A – Active site prediction. B – Binding site prediction.
Figure 2
Figure 2
LIGPLOT scheme for binding ligand residues. The scheme shows the interaction between E. coli DNA ligase protein [40] and ATP (Adenosine-5'-triphosphate) ligand. In red are the DNA ligase's binding site residues.
Figure 3
Figure 3
Creating HMM descriptors from SCOP families. Each SCOP family was segmented in structural groups and HMM descriptors were created from these groups. A – Image shows the building of HMM descriptors for a hypothetical family, namely f1. First, Af1 is built by aligning all proteins of SGf1 structural group. Next, HMMf1 is built from both Af1 alignment and functional site positions of Af1, called Bf1. Finally, f1 is divided into groups of proteins that interact with the same ligand, and HMMf1Li are built in the same way, since L is the number of ligand and 1 ≤ i N. B – In order, for building HMM classifications, the consensus-sequencef1 and each protein in SGf1 are aligned by producing AcsPi, where 1 ≤ i Q. The building of HMMf1pi classificator is based on AcsPi.
Figure 4
Figure 4
Mapping functional residues to HMM states. Each column in the globin alignment maps to either match or insert state, including the columns that represent functional sites. The columns labeled by As represents active site positions, whereas Bs columns represents binding site positions. Mi, Ii and Di represent match, insert and delete states in HMM architecture, respectively. In this illustration, As1 mapped to M8 state, and the Bs1, Bs2 and Bs3 columns mapped to M2, M13 and I15 states, respectively.

References

    1. Chandonia J, Brenner S. The impact of structural genomics: expectations and outcomes. Science. 2006;311:347–351. - PubMed
    1. Bateman A, Valencia A. Structural genomics meets computational biology. Bioinformatics. 2006;22:2319. - PubMed
    1. Kim S, Shin D, Choi I, Gahmen U, Chen S, Kim R. Structure-based functional inference in structural genomics. J Struct Funct Genomics. 2003;4:129–135. - PubMed
    1. Watson J, Laskowski R, Thornton J. Predicting protein function from sequence and structural data. Current opinion in structural biology. 2005;15:275–284. - PubMed
    1. Baker E, Arcus V, Lott J. Protein structure prediction and analysis as a tool for functional genomics. Applied bioinformatics. 2003;2:S3–10. - PubMed

Publication types

LinkOut - more resources