Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Aug 25;6(9):1429-42.
doi: 10.1038/nprot.2011.372.

Systematic prediction of gene function in Arabidopsis thaliana using a probabilistic functional gene network

Affiliations

Systematic prediction of gene function in Arabidopsis thaliana using a probabilistic functional gene network

Sohyun Hwang et al. Nat Protoc. .

Abstract

AraNet is a functional gene network for the reference plant Arabidopsis and has been constructed in order to identify new genes associated with plant traits. It is highly predictive for diverse biological pathways and can be used to prioritize genes for functional screens. Moreover, AraNet provides a web-based tool with which plant biologists can efficiently discover novel functions of Arabidopsis genes (http://www.functionalnet.org/aranet/). This protocol explains how to conduct network-based prediction of gene functions using AraNet and how to interpret the prediction results. Functional discovery in plant biology is facilitated by combining candidate prioritization by AraNet with focused experimental tests.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Overview of the method of using AraNet to discover gene function. The AraNet web tool can be divided into two search paths for identifying new gene functions: ‘Find new members of a pathway’ and ‘Infer function from network neighbors’. If you submit a set of query genes to the ‘Find new members of a pathway’ search, you can retrieve three gene sets: connected query genes, disconnected query genes and the top 200 candidate genes that connect to the query genes. The gene set enrichment analyses using connected query genes and the top 200 candidate genes can provide biological insight from enriched GO, PO and protein domain terms. The top 200 candidate genes can be tested directly for identifying new gene functions. If you submit query gene(s) to ‘Infer function from network neighbors’ search, you can obtain candidate GO biological process terms for each query gene. An alternative source of the query gene is the disconnected query genes from the ‘Find new members of a pathway’ search. Predicted GO biological process terms for each query gene can be tested to discover new gene functions. The genes with newly discovered functions may update the query genes for the next round of the ‘Find new members of a pathway’ search. Such iterative searching can improve the enrichment of GO, PO or protein domain terms in subsequent analysis.
Figure 2
Figure 2
A report from a ‘Find new members of a pathway’ search showing an analysis of query genes (e.g., a set of genes involved in cold acclimation) connected to one another in AraNet. (a) The list of connected query genes contains information about the rank on the basis of: the total connection score to other query genes (Box 2), locus ID, gene symbol, AraNet data types (evidence) supporting connections between the gene and all other query genes (Table 1), the fraction of connected query genes out of the total valid query genes, all other query genes connected to the gene, and three Gene Ontology (GO) annotations (biological process, cellular components and molecular function). The Locus ID links to the annotation page at the TAIR database. (b) The next round of search using only connected query genes can be run by clicking the ‘Submit’ button at the bottom of the screen.
Figure 3
Figure 3
An example of ROC analysis of predictive power of query genes for the ‘Find new members of a pathway’ search in AraNet. (a) A resultant ROC curve summarizing the predictive power of AraNet for ‘Arabidopsis cold acclimation’ with the 20 query genes by AUC score and P value. (b) A toy example network, in which query genes are represented by red nodes and nonquery genes by gray nodes. The link thickness reflects the log-scaled likelihood of two genes sharing a biological function. (c) The resultant curve by ROC analysis of the toy example network. The x axis and the y axis represent the false-positive rate and true-positive rate, respectively. Area under the curve (AUC) score is 0.75. Scores of network genes (including query genes) having the same function as query genes are calculated by integration of all network connections to query genes with weighted-sum method (Box 2). As expected from the high AUC score, the majority of query genes are highly ranked (e.g., all the top four most-likely candidates for query genes are indeed query genes).
Figure 4
Figure 4
An example list of candidate pathway genes from a ‘Find new members of a pathway’ search. AraNet analysis returns a table of rank-ordered new candidates for the pathway of the query genes (e.g., a set of genes involved in cold acclimation). AraNet lists only the top 200 candidate genes in the HTML table and provides a list of all the candidate genes as a text file. The list contains information about the rank on the basis of the total connection score to query genes (Box 2), locus ID, paralogs, gene symbol, AraNet data types (evidence) supporting connections between the gene and all query genes (Table 1), the fraction of connected query genes out of the total valid query genes, all query genes connected to the gene and three GO annotations.
Figure 5
Figure 5
An example of a network layout view page in a new web browser window. (a) AraNet analysis provides downloadable network edge information files for additional network visualization and html pages that contain network view generated by Cytoscape Web (http://cytoscapeweb.cytoscape.org/). (b) A partial view of the network of genes known to have roles in cold acclimation and their connected genes in AraNet. A blue node represents a query gene of cold acclimation, and a green node represents a connected candidate gene. You can zoom in and out on the image by clicking the plus and minus buttons, respectively. If you click the hand icon, you can move the view window to other parts of the network. (c) If you click a node, the lower panel provides detailed information, including the total connection score to query genes. (d) If you click an edge, the lower panel provides detailed information including edge score by AraNet and supporting evidences with corresponding log-likelihood scores before weighted sum integration.
Figure 6
Figure 6
The ‘Infer function from network neighbors’ search. (a) For this search option, we can filter search results by various supporting evidence codes for GO annotation. The default GO evidence types are limited to experimental data (IDA, IMP, IGI, IPI, IEP) and literature (TAS). It is possible to choose additional evidence codes with less reliability to obtain more prediction results. (b) Example reports of predictions of new functions. For each query gene, the report provides candidate GO biological process terms in the prediction table. The table contains five information fields: rank, total score to the neighbors annotated by the candidate GO term, evidence supporting the AraNet connections to the neighbors annotated by the candidate GO term (Table 1), a predicted GO term and its GO term-supporting genes connected to this query gene.
Figure 7
Figure 7
A report from a ‘Find new members of a pathway’ search. The report shows an analysis of disconnected query genes in AraNet (e.g., a set of genes involved in cold acclimation). The list of disconnected query genes contains information such as locus ID, gene symbol and three GO annotations. You may directly submit the disconnected genes to an ‘Infer function from network neighbors’ search by clicking the ‘Submit’ button.
Figure 8
Figure 8
Functional gene set enrichment analysis (e.g., a set of 20 query genes involved in cold acclimation). (a) Optional function enrichment analysis is available. Using the listed valid query genes, the top 200 new candidate genes, or the combined gene set, AraNet reports the enriched GO, PO and protein domain terms. (b) A GO, PO and InterPro protein domain enrichment analysis report. This enrichment analysis tool provides three analysis results for each reference gene set: GO, PO and protein domain. The report table shows the following nine fields of information: rank based on an adjusted P value; GO, PO or protein domain ID; a brief description of the ID; hypergeometric P value of the ID; adjusted hypergeometric P value by false discovery rate; the number of total Arabidopsis genes; the number of query genes; the number of genes annotated with the ID; and the number of genes common to both the query genes and the genes annotated with the ID.

References

    1. McGary KL, Lee I, Marcotte EM. Broad network-based predictability of Saccharomyces cerevisiae gene loss-of-function phenotypes. Genome Biol. 2007;8:R258. - PMC - PubMed
    1. Lehner B, Lee I. Network-guided genetic screening: building, testing and using gene networks to predict gene function. Brief Funct. Genomic Proteomic. 2008;7:217–227. - PubMed
    1. Alonso JM, et al. Genome-wide insertional mutagenesis of Arabidopsis thaliana. Science. 2003;301:653–657. - PubMed
    1. Lee I, Date SV, Adai AT, Marcotte EM. A probabilistic functional network of yeast genes. Science. 2004;306:1555–1558. - PubMed
    1. Marcotte EM, Pellegrini M, Thompson MJ, Yeates TO, Eisenberg D. A combined algorithm for genome-wide prediction of protein function. Nature. 1999;402:83–86. - PubMed

Publication types

Substances