Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Nov 17;4(5):197-205.
doi: 10.6026/97320630004197.

A decision tree model for the prediction of homodimer folding mechanism

Affiliations

A decision tree model for the prediction of homodimer folding mechanism

Abishek Suresh et al. Bioinformation. .

Abstract

The formation of protein homodimer complexes for molecular catalysis and regulation is fascinating. The homodimer formation through 2S (2 state), 3SMI (3 state with monomer intermediate) and 3SDI (3 state with dimer intermediate) folding mechanism is known for 47 homodimer structures. Our dataset of forty-seven homodimers consists of twenty-eight 2S, twelve 3SMI and seven 3SDI. The dataset is characterized using monomer length, interface area and interface/total (I/T) residue ratio. It is found that 2S are often small in size with large I/T ratio and 3SDI are frequently large in size with small I/T ratio. Nonetheless, 3SMI have a mixture of these features. Hence, we used these parameters to develop a decision tree model. The decision tree model produced positive predictive values (PPV) of 72% for 2S, 58% for 3SMI and 57% for 3SDI in cross validation. Thus, the method finds application in assigning homodimers with folding mechanism.

Keywords: decision tree; folding; homodimer; mechanism; prediction.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Distribution of 2S, 3SMI and 3SDI for ML, B/2 and I/T is shown.
  1. An illustration of the minimum and maximum limits of ML for 2S, 3SMI and 3SDI homodimers in the dataset is presented. The X ‐ axis represents monomer length. The overlap regions are shown horizontally. 2S proteins range from 45 to 271, 3SMI range from 72 to 381 and 3SDI range from 90 to 835.

  2. An illustration of the minimum and maximum limits of ML for 2S, 3SMI and 3SDI homodimers in the dataset is presented. The X axis represents interface area. The overlap regions are shown horizontally. 2S proteins range from 156 to 2507, 3SMI range from 309 to 2332 and 3SDI range from 1351 to 2317.

  3. Distribution of 2S, 3SMI and 3SDI for I/T ratio.

The overlap regions are shown horizontally. 2S proteins range from 6 to 80, 3SMI range from 9 to 44 and 3SDI range from 5 to 50. It should be noted that there is no Y-axis variable defined in this case. However, a Y-axis is shown for convenience of visualization.
Figure 2
Figure 2
Percent cumulative frequency of 2S, 3SMI and 3SDI for ML, I/T and B/2 is given.
  1. The distribution of the cumulative frequency of ML for 2S, 3SMI and 3SDI homodimers in the dataset is presented. About 90% of 2S, 60% of 3SMI and 15% of 3SDI are covered when ML ≦ 250. Hence, ML ≦250 was selected as a decision condition in the development of the model.

  2. The distribution of the cumulative frequency of I/T ratio for 2S, 3SMI and 3SDI homodimers in the dataset is presented. About 30% of 2S and 90% of 3SMI and 3SDI are covered when I/T ≦ 25%. Hence, I/T ≦25% was selected as a decision condition in the development of the model.

  3. The distribution of the cumulative frequency of interface area for 2S, 3SMI and 3SDI homodimers in the dataset is presented. About 50% of 2S, 70% of 3SMI and 30% of 3SDI are covered when B/2 ≦ 1500. Hence, B/2 ≦ 1500 was selected as a decision condition in the development of the model.

Figure 3
Figure 3
A flowchart describing the decision tree model is given. The decision tree model checks for predictor values within defined conditional values for multiple variables in a subsequent manner sequentially so as to reach the respective nodes to predict and assign target variables.

Similar articles

Cited by

References

    1. Zhanhua C, et al. Bioinformation. 2005;1:28. - PMC - PubMed
    1. Wales TE, et al. Protein Sci. 2004;13:1918. - PMC - PubMed
    1. Bowie JU, Sauer RT. Biochemistry. 1989;28:7139. - PubMed
    1. Milla ME, Sauer RT. Biochemistry. 1994;33:1125. - PubMed
    1. Steif C, et al. Biochemistry. 1993;32:3867. - PubMed

LinkOut - more resources