Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Mar 2;6(1):10-4.
doi: 10.6026/97320630006010.

Discovering amino acid patterns on binding sites in protein complexes

Discovering amino acid patterns on binding sites in protein complexes

Huang-Cheng Kuo et al. Bioinformation. .

Abstract

Discovering amino acid (AA) patterns on protein binding sites has recently become popular. We propose a method to discover the association relationship among AAs on binding sites. Such knowledge of binding sites is very helpful in predicting protein-protein interactions. In this paper, we focus on protein complexes which have protein-protein recognition. The association rule mining technique is used to discover geographically adjacent amino acids on a binding site of a protein complex. When mining, instead of treating all AAs of binding sites as a transaction, we geographically partition AAs of binding sites in a protein complex. AAs in a partition are treated as a transaction. For the partition process, AAs on a binding site are projected from three-dimensional to two-dimensional. And then, assisted with a circular grid, AAs on the binding site are placed into grid cells. A circular grid has ten rings: a central ring, the second ring with 6 sectors, the third ring with 12 sectors, and later rings are added to four sectors in order. As for the radius of each ring, we examined the complexes and found that 10Å is a suitable range, which can be set by the user. After placing these recognition complexes on the circular grid, we obtain mining records (i.e. transactions) from each sector. A sector is regarded as a record. Finally, we use the association rule to mine these records for frequent AA patterns. If the support of an AA pattern is larger than the predetermined minimum support (i.e. threshold), it is called a frequent pattern. With these discovered patterns, we offer the biologists a novel point of view, which will improve the prediction accuracy of protein-protein recognition. In our experiments, we produced the AA patterns by data mining. As a result, we found that arginine (arg) most frequently appears on the binding sites of two proteins in the recognition protein complexes, while cysteine (cys) appears the fewest. In addition, if we discriminate the shape of binding sites between concave and convex further, we discover that patterns {arg, glu, asp} and {arg, ser, asp} on the concave shape of binding sites in a protein more frequently (i.e. higher probability) make contact with {lys} or {arg} on the convex shape of binding sites in another protein. Thus, we can confidently achieve a rate of at least 78%. On the other hand {val, gly, lys} on the convex surface of binding sites in proteins is more frequently in contact with {asp} on the concave site of another protein, and the confidence achieved is over 81%. Applying data mining in biology can reveal more facts that may otherwise be ignored or not easily discovered by the naked eye. Furthermore, we can discover more relationships among AAs on binding sites by appropriately rotating these residues on binding sites from a three-dimension to two-dimension perspective. We designed a circular grid to deposit the data, which total to 463 records consisting of AAs. Then we used the association rules to mine these records for discovering relationships. The proposed method in this paper provides an insight into the characteristics of binding sites for recognition complexes.

Keywords: Association rules; Binding sites; Data mining; Protein complexes; Protein-protein recognition.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The rotation illustration.
Figure 2
Figure 2
The illustration of protein complex 1BKD. The left picture shows the result of above steps.
Figure 3
Figure 3
The illustration of itemset generation.
Figure 4
Figure 4
The illustration for convex and concave shape of binding sites.
Figure 5
Figure 5
The relations between {arg} on the convex side in a protein and AA patterns on the concave side in another protein.
Figure 6
Figure 6
The relations between {arg} on the concave side in a protein and AA patterns on the convex side in another protein.
Figure 7
Figure 7
Frequent patterns are consisted of AAs on the convex and AA patterns on the concave. For an example, {val, ile} → {gly}, {gly} is on the convex of binding sites in a protein, and {val, ile} is on the convex of binding sites in another protein.
Figure 8
Figure 8
Frequent patterns are consisted of AAs on the concave of binding sites in a protein and AA patterns on the convex of binding sites in another protein.

References

    1. S Jones, JM Thornton. Proc Natl Acad Sci U S A. 1996;93(1):13. - PubMed
    1. RA Craig, L Liao. BMC Bioinformatics. 2007;8:6. - PMC - PubMed
    1. A Koike, T Takagi. Protein Eng Des Sel. 2004;17(2):165. - PubMed
    1. P Fariselli, et al. Eur J Biochem. 2002;5:1356. - PubMed
    1. C Huang, et al. IEEE/ACM Trans Comput Biol Bioinform. 2007;4(1):78. - PubMed