Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013;14 Suppl 3(Suppl 3):S15.
doi: 10.1186/1471-2105-14-s3-s15.

Characterizing the state of the art in the computational assignment of gene function: lessons from the first critical assessment of functional annotation (CAFA)

Affiliations

Characterizing the state of the art in the computational assignment of gene function: lessons from the first critical assessment of functional annotation (CAFA)

Jesse Gillis et al. BMC Bioinformatics. 2013.

Abstract

The assignment of gene function remains a difficult but important task in computational biology. The establishment of the first Critical Assessment of Functional Annotation (CAFA) was aimed at increasing progress in the field. We present an independent analysis of the results of CAFA, aimed at identifying challenges in assessment and at understanding trends in prediction performance. We found that well-accepted methods based on sequence similarity (i.e., BLAST) have a dominant effect. Many of the most informative predictions turned out to be either recovering existing knowledge about sequence similarity or were "post-dictions" already documented in the literature. These results indicate that deep challenges remain in even defining the task of function assignment, with a particular difficulty posed by the problem of defining function in a way that is not dependent on either flawed gold standards or the input data itself. In particular, we suggest that using the Gene Ontology (or other similar systematizations of function) as a gold standard is unlikely to be the way forward.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Summaries of performance using the "precision-recall"-based CAFA score. A and B show results for the BP ontology; C and D for MF. Submitted results are shown in black, and the null Prevalence data are represented in grey. A and C plot the distribution of scores for all evaluation targets, averaged across algorithms. B and D show the distribution of scores across algorithms, averaged across targets.
Figure 2
Figure 2
Summaries of performance using ROC curves. Results are only presented for BP because the MF results were too strongly affected by biases due to the E. coli annotations. A. Distribution of AUROCs for the GO terms evaluated. The mean performance across algorithms is shown in black. A simple aggregation algorithm does much better on average, shown in grey. B. Density plot showing the overlay of the ROC curves that make up the results shown for the aggregation algorithm in A, with areas of high density shown in lighter shades. Scattered light areas are artifacts due to the effects of GO groups with smaller numbers of genes. Note that the Prevalence method is guaranteed to generate AUROCs of 0.5 for all functions since it ranks all genes equally.
Figure 3
Figure 3
Summaries of performance based on information content. Results are only shown for BP because of the distorting effect of E. coli annotations in MF. A. The fraction of predictions considered informative per algorithm. B. Overlaps among informative predictions. Most sequences received no informative predictions (peak at 0), while numerous predictions are made by multiple algorithms.

References

    1. Moult J. A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. Curr Opin Struct Biol. 2005;15(3):285–289. doi: 10.1016/j.sbi.2005.05.011. - DOI - PubMed
    1. Zhang Y. Progress and challenges in protein structure prediction. Curr Opin Struct Biol. 2008;18(3):342–348. doi: 10.1016/j.sbi.2008.02.004. - DOI - PMC - PubMed
    1. Oliver S. Guilt-by-association goes global. Nature. 2000;403(6770):601–603. doi: 10.1038/35001165. - DOI - PubMed
    1. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT. et al.Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25(1):25–29. doi: 10.1038/75556. - DOI - PMC - PubMed
    1. Godzik A, Jambon M, Friedberg I. Computational protein function prediction: are we making progress? Cell Mol Life Sci. 2007;64(19-20):2505–2511. doi: 10.1007/s00018-007-7211-y. - DOI - PMC - PubMed

Publication types

LinkOut - more resources