Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008;9 Suppl 1(Suppl 1):S7.
doi: 10.1186/gb-2008-9-s1-s7. Epub 2008 Jun 27.

Combining guilt-by-association and guilt-by-profiling to predict Saccharomyces cerevisiae gene function

Affiliations

Combining guilt-by-association and guilt-by-profiling to predict Saccharomyces cerevisiae gene function

Weidong Tian et al. Genome Biol. 2008.

Abstract

Background: Learning the function of genes is a major goal of computational genomics. Methods for inferring gene function have typically fallen into two categories: 'guilt-by-profiling', which exploits correlation between function and other gene characteristics; and 'guilt-by-association', which transfers function from one gene to another via biological relationships.

Results: We have developed a strategy ('Funckenstein') that performs guilt-by-profiling and guilt-by-association and combines the results. Using a benchmark set of functional categories and input data for protein-coding genes in Saccharomyces cerevisiae, Funckenstein was compared with a previous combined strategy. Subsequently, we applied Funckenstein to 2,455 Gene Ontology terms. In the process, we developed 2,455 guilt-by-profiling classifiers based on 8,848 gene characteristics and 12 functional linkage graphs based on 23 biological relationships.

Conclusion: Funckenstein outperforms a previous combined strategy using a common benchmark dataset. The combination of 'guilt-by-profiling' and 'guilt-by-association' gave significant improvement over the component classifiers, showing the greatest synergy for the most specific functions. Performance was evaluated by cross-validation and by literature examination of the top-scoring novel predictions. These quantitative predictions should help prioritize experimental study of yeast gene functions.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Performance of MIPS function prediction using a previously established benchmark dataset. Results are shown for five different methods: Funckenstein (light red); MRF-NB [26] (black); guilt-by-profiling by the RF method alone (dark red); guilt-by-profiling by the PDT method alone (violet); guilt-by-association by the FL method alone (blue). (a) True positive rate versus false positive rate at different score thresholds. (b) Precision versus recall at prediction score thresholds. MIPS, Munich Information Center for Protein Sequences; MRF, Markov random field; NB, naïve Bayes; PDT, probabilistic decision tree; RF, random forest.
Figure 2
Figure 2
Performance of GO term prediction using either the RF guilt-by-profiling (RF; brown) or FL guilt-by-association classifiers (FL). Three types of FL classifiers were compared: FL1 (green), which used only gene characteristics used in the RF classifier that have been recoded as gene pair characteristics; FL2 (red), which used only 'intrinsic' gene-gene relationships; and FL3 (blue), which used both intrinsic and recoded gene-gene characteristics. (a-l) Plots are organized according to GO branch and GO term specificity. FL, functional linkage; GO, Gene Ontology; RF, random forest.
Figure 3
Figure 3
Cross-validation results for Funckenstein and the RF guilt-by-profiling component classifier (RF) alone (brown). Three versions of Funckenstein were compared, each integrating RF with one of three variants of the FL guilt-by-association classifier (FL): FL1, FL using only relationships derived from shared gene characteristics (green); FL2, FL using only direct gene-gene relationships (red); and FL3, FL using all types of relationship (blue). (a-l) Plots are organized according to GO branch and GO term specificity. FL, functional linkage; GO, Gene Ontology; RF, random forest.
Figure 4
Figure 4
Prediction scores for 'verified', 'uncharacterized', and 'dubious' genes. For 'verified' (red) or 'uncharacterized' (blue) genes, the log ratio of the number of predictions within each score interval (relative to the number for 'dubious' genes) is shown.
Figure 5
Figure 5
The average frequency of each type of gene characteristic among the five most important variables (see Materials and methods for the variable performance measure): phenotype (brown); protein complex and/or cellular localization (blue); transcription regulation (green); and protein sequence pattern (red). The gene characteristic types are organized according to their order in Table 3. (a-l) Plots are organized according to GO branch and GO term specificity. GO, Gene Ontology.
Figure 6
Figure 6
An example of a PDT used to generate a FL graph. This example was trained based on annotations of those BP GO terms that are currently annotated to 3 to 10 genes. BP, biological process; FL, functional linkage; PDT, probabilistic decision tree.
Figure 7
Figure 7
Assessment of 120 novel predictions by an expert curator. Assessments were either 'known correct', 'likely true', 'unclear', 'unlikely to be true' or 'highly unlikely'. (a) For each of the three GO branches, the proportion of novel predictions given each assessment. (b) For each specificity level, the proportion of novel predictions given each assessment. GO, Gene Ontology.

References

    1. Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, Futcher B. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell. 1998;9:3273–3297. - PMC - PubMed
    1. Shalon D, Smith SJ, Brown PO. A DNA microarray system for analyzing complex DNA samples using two-color fluorescent probe hybridization. Genome Res. 1996;6:639–645. doi: 10.1101/gr.6.7.639. - DOI - PubMed
    1. Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR, Lockshon D, Narayan V, Srinivasan M, Pochart P, Qureshi-Emili A, Li Y, Godwin B, Conover D, Kalbfleisch T, Vijayadamodar G, Yang M, Johnston M, Fields S, Rothberg JM. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature. 2000;403:623–627. doi: 10.1038/35001009. - DOI - PubMed
    1. Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci USA. 2001;98:4569–4574. doi: 10.1073/pnas.061034498. - DOI - PMC - PubMed
    1. Gavin AC, Bösche M, Krause R, Grandi P, Marzioch M, Bauer A, Schultz J, Rick JM, Michon AM, Cruciat CM, Remor M, Höfert C, Schelder M, Brajenovic M, Ruffner H, Merino A, Klein K, Hudak M, Dickson D, Rudi T, Gnau V, Bauch A, Bastuck S, Huhse B, Leutwein C, Heurtier MA, Copley RR, Edelmann A, Querfurth E, Rybin V, et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature. 2002;415:141–147. doi: 10.1038/415141a. - DOI - PubMed

Publication types

MeSH terms

Substances