Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Sep;151(1):34-46.
doi: 10.1104/pp.109.141317. Epub 2009 Jul 10.

Computational identification of potential molecular interactions in Arabidopsis

Affiliations

Computational identification of potential molecular interactions in Arabidopsis

Mingzhi Lin et al. Plant Physiol. 2009 Sep.

Abstract

Knowledge of the protein interaction network is useful to assist molecular mechanism studies. Several major repositories have been established to collect and organize reported protein interactions. Many interactions have been reported in several model organisms, yet a very limited number of plant interactions can thus far be found in these major databases. Computational identification of potential plant interactions, therefore, is desired to facilitate relevant research. In this work, we constructed a support vector machine model to predict potential Arabidopsis (Arabidopsis thaliana) protein interactions based on a variety of indirect evidence. In a 100-iteration bootstrap evaluation, the confidence of our predicted interactions was estimated to be 48.67%, and these interactions were expected to cover 29.02% of the entire interactome. The sensitivity of our model was validated with an independent evaluation data set consisting of newly reported interactions that did not overlap with the examples used in model training and testing. Results showed that our model successfully recognized 28.91% of the new interactions, similar to its expected sensitivity (29.02%). Applying this model to all possible Arabidopsis protein pairs resulted in 224,206 potential interactions, which is the largest and most accurate set of predicted Arabidopsis interactions at present. In order to facilitate the use of our results, we present the Predicted Arabidopsis Interactome Resource, with detailed annotations and more specific per interaction confidence measurements. This database and related documents are freely accessible at http://www.cls.zju.edu.cn/pair/.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Protein distribution in our interaction examples and in the entire Arabidopsis proteome. Blue bars show the fraction of interacting proteins in each category, and red bars show the fraction of all Arabidopsis proteins in each category. A, Protein distribution based on the GO Slim Biological Process categories. The Pearson's correlation coefficient between two distributions is 0.95 (P = 3.17 × 10−7). B, Protein distribution based on the GO Slim Molecular Function categories. The correlation coefficient is 0.52 (P = 2.8 × 10−2). C, Protein distribution based on the GO Slim Cellular Component categories. The correlation coefficient is 0.68 (P = 2.9 × 10−3). ER, Endoplasmic reticulum. D, Protein distribution based on the Pfam protein family classifications. The correlation coefficient is 0.75 (P < 1 × 10 −10). In this diagram, only the largest 150 families are shown for clarity. [See online article for color version of this figure.]
Figure 2.
Figure 2.
Comparison of the feature value distribution in interaction examples and the feature value distribution in noninteraction examples. For each feature, its value range is divided into 20 equal bins. Blue curves and red curves show the fraction of interaction examples and the fraction of noninteraction examples that fall into each bin. Green curves represent the LR. A complete set of these analysis diagrams can be found in Supplemental Text S1. The horizontal axis represents the feature value (in A–C) or the 1 – feature value (in D). The left vertical axis represents the fraction of protein pairs. The right vertical axis represents the LR. A, Protein colocalization feature. B, Domain interaction feature 9 (Random Decision Forest Framework). C, Gene coexpression feature. D, Shared annotation feature 3 (cellular component). [See online article for color version of this figure.]
Figure 3.
Figure 3.
The meiotic recombination-related proteins in our predicted interactome. Eleven seed proteins reviewed by Wijeratne and Ma (2007) and their first neighbors were extracted from our predicted interactions. Interactions between them can be grouped into seven clusters, with seed proteins involved in different recombination steps distributed in nonoverlapping clusters. Triangles represent the seed proteins and circles represent their neighbors. Proteins (nodes) are colored according to their GO molecular function annotations. Edges representing experimentally confirmed interactions are colored red. Those representing interactions homologous to known interactions in other organisms are colored blue. Other predicted interactions are colored gray. [See online article for color version of this figure.]

Similar articles

Cited by

References

    1. Alfarano C, Andrade CE, Anthony K, Bahroos N, Bajec M, Bantoft K, Betel D, Bobechko B, Boutilier K, Burgess E, et al (2005) The Biomolecular Interaction Network Database and related tools 2005 update. Nucleic Acids Res 33 D418–D424 - PMC - PubMed
    1. Alvarez-Venegas R, Pien S, Sadder M, Witmer X, Grossniklaus U, Avramova Z (2003) ATX-1, an Arabidopsis homolog of trithorax, activates flower homeotic genes. Curr Biol 13 627–637 - PubMed
    1. Ben-Hur A, Noble WS (2006) Choosing negative examples for the prediction of protein-protein interactions. BMC Bioinformatics (Suppl 1) 7: S2 - PMC - PubMed
    1. Ben-Yacoub S, Abdeljaoued Y, Mayoraz E (1999) Fusion of face and speech data for person identity verification. IEEE Trans Neural Netw 10 1065–1074 - PubMed
    1. Bhardwaj N, Lu H (2005) Correlation between gene expression profiles and protein-protein interactions within and across genomes. Bioinformatics 21 2730–2738 - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources