Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Aug 12;5(8):e12139.
doi: 10.1371/journal.pone.0012139.

A genome-wide gene function prediction resource for Drosophila melanogaster

Affiliations

A genome-wide gene function prediction resource for Drosophila melanogaster

Han Yan et al. PLoS One. .

Abstract

Predicting gene functions by integrating large-scale biological data remains a challenge for systems biology. Here we present a resource for Drosophila melanogaster gene function predictions. We trained function-specific classifiers to optimize the influence of different biological datasets for each functional category. Our model predicted GO terms and KEGG pathway memberships for Drosophila melanogaster genes with high accuracy, as affirmed by cross-validation, supporting literature evidence, and large-scale RNAi screens. The resulting resource of prioritized associations between Drosophila genes and their potential functions offers a guide for experimental investigations.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Overview of the Function-Specific Classifier model.
A. To train the model, features describing relationships between each given candidate gene and reference genes for a given function Ki were derived from large-scale biological datasets. B. Results were evaluated by 10-fold cross-validation.
Figure 2
Figure 2. Performance of the GO term prediction model.
Receiver Operating Characteristic curves (A) and Precision-Recall curves (B) for the overall performance and contribution of each feature in GO term (biological process, BP) prediction. Precision-Recall curves for the GO term prediction model for GO terms with various degrees of specificity, i.e., those that have been annotated with 2–25 genes (C), 25–50 genes (D), 50–100 genes (E), and 100–500 genes (F).
Figure 3
Figure 3. Performance of the KEGG pathway prediction model.
Receiver Operating Characteristic curves (A) and Precision-Recall curves (B) for the overall performance and contribution of each feature in the KEGG pathway prediction. Precision-Recall curves for the performance of the model in predicting metabolism only (C), signaling pathway only (D), basic functions (E), and all non-metabolism functions (F).
Figure 4
Figure 4. Assessing prediction quality against RNAi screening results.
Precision-Recall curves (A) and the curves for precision vs. confidence score threshold (B) for the quality of GO term prediction measured by DRSC RNAi screening results; Precision-Recall curves (C) and the curves for precision vs. confidence score threshold (D) for the quality of KEGG pathway membership prediction measured by DRSC RNAi screening results.
Figure 5
Figure 5. Comparison between GO term/KEGG pathway prediction and DRSC RNAi screening hits.
A–B, GO/KEGG predictions matched with RNAi screen results compared to randomized RNAi screen data. C–E, individual pathway/function predictions matched with RNAi screen results compared to randomized RNAi screen data. For comparison, we show performance of a supervised machine-learning model trained using the same algorithm and datasets except that it aggregates all GO terms/KEGG pathways in its training as has been traditionally done.

References

    1. Ito T, Tashiro K, Muta S, Ozawa R, Chiba T, et al. Toward a protein-protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. Proc Natl Acad Sci U S A. 2000;97:1143–1147. - PMC - PubMed
    1. Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, et al. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature. 2000;403:623–627. - PubMed
    1. Walhout AJ, Sordella R, Lu X, Hartley JL, Temple GF, et al. Protein interaction mapping in C. elegans using proteins involved in vulval development. Science. 2000;287:116–122. - PubMed
    1. Gavin AC, Bosche M, Krause R, Grandi P, Marzioch M, et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature. 2002;415:141–147. - PubMed
    1. Ho Y, Gruhler A, Heilbut A, Bader GD, Moore L, et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature. 2002;415:180–183. - PubMed

Publication types

LinkOut - more resources