. 2016 Sep 7;17(1):184.

doi: 10.1186/s13059-016-1037-6.

An expanded evaluation of protein function prediction methods shows an improvement in accuracy

Yuxiang Jiang¹, Tal Ronnen Oron², Wyatt T Clark³, Asma R Bankapur⁴, Daniel D'Andrea⁵, Rosalba Lepore⁵, Christopher S Funk⁶, Indika Kahanda⁷, Karin M Verspoor^{8

9}, Asa Ben-Hur⁷, Da Chen Emily Koo¹⁰, Duncan Penfold-Brown^{11

12}, Dennis Shasha¹³, Noah Youngs^{12

13

14}, Richard Bonneau^{13

14

15}, Alexandra Lin¹⁶, Sayed M E Sahraeian¹⁷, Pier Luigi Martelli¹⁸, Giuseppe Profiti¹⁸, Rita Casadio¹⁸, Renzhi Cao¹⁹, Zhaolong Zhong¹⁹, Jianlin Cheng¹⁹, Adrian Altenhoff^{20

21}, Nives Skunca^{20

21}, Christophe Dessimoz^{22

23

24}, Tunca Dogan²⁵, Kai Hakala^{26

27}, Suwisa Kaewphan^{26

27

28}, Farrokh Mehryary^{26

27}, Tapio Salakoski^{26

28}, Filip Ginter²⁶, Hai Fang²⁹, Ben Smithers²⁹, Matt Oates²⁹, Julian Gough²⁹, Petri Törönen³⁰, Patrik Koskinen³⁰, Liisa Holm^{30

31}, Ching-Tai Chen³², Wen-Lian Hsu³², Kevin Bryson²², Domenico Cozzetto²², Federico Minneci²², David T Jones²², Samuel Chapman³³, Dukka Bkc³³, Ishita K Khan³⁴, Daisuke Kihara^{34

35}, Dan Ofer³⁶, Nadav Rappoport^{36

37}, Amos Stern^{36

37}, Elena Cibrian-Uhalte²⁵, Paul Denny³⁸, Rebecca E Foulger³⁸, Reija Hieta²⁵, Duncan Legge²⁵, Ruth C Lovering³⁸, Michele Magrane²⁵, Anna N Melidoni³⁸, Prudence Mutowo-Meullenet²⁵, Klemens Pichler²⁵, Aleksandra Shypitsyna²⁵, Biao Li², Pooya Zakeri^{39

40}, Sarah ElShal^{39

40}, Léon-Charles Tranchevent^{41

42

43}, Sayoni Das⁴⁴, Natalie L Dawson⁴⁴, David Lee⁴⁴, Jonathan G Lees⁴⁴, Ian Sillitoe⁴⁴, Prajwal Bhat⁴⁵, Tamás Nepusz⁴⁶, Alfonso E Romero⁴⁷, Rajkumar Sasidharan⁴⁸, Haixuan Yang⁴⁹, Alberto Paccanaro⁴⁷, Jesse Gillis⁵⁰, Adriana E Sedeño-Cortés⁵¹, Paul Pavlidis⁵², Shou Feng¹, Juan M Cejuela⁵³, Tatyana Goldberg⁵³, Tobias Hamp⁵³, Lothar Richter⁵³, Asaf Salamov⁵⁴, Toni Gabaldon^{55

56

57}, Marina Marcet-Houben^{55

56}, Fran Supek^{56

58

59}, Qingtian Gong^{60

61}, Wei Ning^{60

61}, Yuanpeng Zhou^{60

61}, Weidong Tian^{60

61}, Marco Falda⁶², Paolo Fontana⁶³, Enrico Lavezzo⁶², Stefano Toppo⁶², Carlo Ferrari⁶⁴, Manuel Giollo^{64

65}, Damiano Piovesan⁶⁴, Silvio C E Tosatto⁶⁴, Angela Del Pozo⁶⁶, José M Fernández⁶⁷, Paolo Maietta⁶⁸, Alfonso Valencia⁶⁸, Michael L Tress⁶⁸, Alfredo Benso⁶⁹, Stefano Di Carlo⁶⁹, Gianfranco Politano⁶⁹, Alessandro Savino⁶⁹, Hafeez Ur Rehman⁷⁰, Matteo Re⁷¹, Marco Mesiti⁷¹, Giorgio Valentini⁷¹, Joachim W Bargsten⁷², Aalt D J van Dijk^{72

73}, Branislava Gemovic⁷⁴, Sanja Glisic⁷⁴, Vladmir Perovic⁷⁴, Veljko Veljkovic⁷⁴, Nevena Veljkovic⁷⁴, Danillo C Almeida-E-Silva⁷⁵, Ricardo Z N Vencio⁷⁵, Malvika Sharan⁷⁶, Jörg Vogel⁷⁶, Lakesh Kansakar⁷⁷, Shanshan Zhang⁷⁷, Slobodan Vucetic⁷⁷, Zheng Wang⁷⁸, Michael J E Sternberg⁷⁹, Mark N Wass⁸⁰, Rachael P Huntley²⁵, Maria J Martin²⁵, Claire O'Donovan²⁵, Peter N Robinson⁸¹, Yves Moreau⁸², Anna Tramontano⁵, Patricia C Babbitt⁸³, Steven E Brenner¹⁷, Michal Linial⁸⁴, Christine A Orengo⁴⁴, Burkhard Rost⁵³, Casey S Greene⁸⁵, Sean D Mooney⁸⁶, Iddo Friedberg^{87

88}, Predrag Radivojac⁸⁹

Affiliations

¹ Department of Computer Science and Informatics, Indiana University, Bloomington, IN, USA.
² Buck Institute for Research on Aging, Novato, CA, USA.
³ Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA.
⁴ Department of Microbiology, Miami University, Oxford, OH, USA.
⁵ University of Rome, La Sapienza, Rome, Italy.
⁶ Computational Bioscience Program, University of Colorado School of Medicine, Aurora, CO, USA.
⁷ Department of Computer Science, Colorado State University, Fort Collins, CO, USA.
⁸ Department of Computing and Information Systems, University of Melbourne, Parkville, Victoria, Australia.
⁹ Health and Biomedical Informatics Centre, University of Melbourne, Parkville, Victoria, Australia.
¹⁰ Department of Biology, New York University, New York, NY, USA.
¹¹ Social Media and Political Participation Lab, New York University, New York, NY, USA.
¹² CY Data Science, New York, NY, USA.
¹³ Department of Computer Science, New York University, New York, NY, USA.
¹⁴ Simons Center for Data Analysis, New York, NY, USA.
¹⁵ Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA.
¹⁶ Department of Electrical Engineering and Computer Sciences, University of California Berkeley, Berkeley, CA, USA.
¹⁷ Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA, USA.
¹⁸ Biocomputing Group, BiGeA, University of Bologna, Bologna, Italy.
¹⁹ Computer Science Department, University of Missouri, Columbia, MO, USA.
²⁰ ETH Zurich, Zurich, Switzerland.
²¹ Swiss Institute of Bioinformatics, Zurich, Switzerland.
²² Bioinformatics Group, Department of Computer Science, University College London, London, UK.
²³ University of Lausanne, Lausanne, Switzerland.
²⁴ Swiss Institute of Bioinformatics, Lausanne, Switzerland.
²⁵ European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK.
²⁶ Department of Information Technology, University of Turku, Turku, Finland.
²⁷ University of Turku Graduate School, University of Turku, Turku, Finland.
²⁸ Turku Centre for Computer Science, Turku, Finland.
²⁹ University of Bristol, Bristol, UK.
³⁰ Institute of Biotechnology, University of Helsinki, Helsinki, Finland.
³¹ Department of Biological and Environmental Sciences, Universitity of Helsinki, Helsinki, Finland.
³² Institute of Information Science, Academia Sinica, Taipei, Taiwan.
³³ Department of Computational Science and Engineering, North Carolina A&T State University, Greensboro, NC, USA.
³⁴ Department of Computer Science, Purdue University, West Lafayette, IN, USA.
³⁵ Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
³⁶ Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel.
³⁷ School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel.
³⁸ Centre for Cardiovascular Genetics, Institute of Cardiovascular Science, University College London, London, UK.
³⁹ Department of Electrical Engineering, STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven, Leuven, Belgium.
⁴⁰ iMinds Department Medical Information Technologies, Leuven, Belgium.
⁴¹ Inserm UMR-S1052, CNRS UMR5286, Cancer Research Centre of Lyon, Lyon, France.
⁴² Université de Lyon 1, Villeurbanne, France.
⁴³ Centre Léon Bérard, Lyon, France.
⁴⁴ Institute of Structural and Molecular Biology, University College London, London, UK.
⁴⁵ Cerenode Inc., Boston, MA, USA.
⁴⁶ Molde University College, Molde, Norway.
⁴⁷ Department of Computer Science, Centre for Systems and Synthetic Biology, Royal Holloway University of London, Egham, UK.
⁴⁸ Department of Molecular, Cell and Developmental Biology, University of California at Los Angeles, Los Angeles, CA, USA.
⁴⁹ School of Mathematics, Statistics and Applied Mathematics, National University of Ireland, Galway, Ireland.
⁵⁰ Stanley Institute for Cognitive Genomics Cold Spring Harbor Laboratory, New York, NY, USA.
⁵¹ Graduate Program in Bioinformatics, University of British Columbia, Vancouver, Canada.
⁵² Department of Psychiatry and Michael Smith Laboratories, University of British Columbia, Vancouver, Canada.
⁵³ Department for Bioinformatics and Computational Biology-I12, Technische Universität München, Garching, Germany.
⁵⁴ DOE Joint Genome Institute, Walnut Creek, CA, USA.
⁵⁵ Bioinformatics and Genomics, Centre for Genomic Regulation, Barcelona, Spain.
⁵⁶ Universitat Pompeu Fabra, Barcelona, Spain.
⁵⁷ Institució Catalana de Recerca i Estudis Avançats, Barcelona, Spain.
⁵⁸ Division of Electronics, Rudjer Boskovic Institute, Zagreb, Croatia.
⁵⁹ EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation, Barcelona, Spain.
⁶⁰ State Key Laboratory of Genetic Engineering, Collaborative Innovation Center of Genetics and Development, Department of Biostatistics and Computational Biology, School of Life Science, Fudan University, Shanghai, China.
⁶¹ Children's Hospital of Fudan University, Shanghai, China.
⁶² Department of Molecular Medicine, University of Padua, Padua, Italy.
⁶³ Research and Innovation Center, Edmund Mach Foundation, San Michele all'Adige, Italy.
⁶⁴ Department of Information Engineering, University of Padua, Padova, Italy.
⁶⁵ Department of Biomedical Sciences, University of Padua, Padova, Italy.
⁶⁶ Instituto De Genetica Medica y Molecular, Hospital Universitario de La Paz, Madrid, Spain.
⁶⁷ Spanish National Bioinformatics Institute, Spanish National Cancer Research Institute, Madrid, Spain.
⁶⁸ Structural and Computational Biology Programme, Spanish National Cancer Research Institute, Madrid, Spain.
⁶⁹ Control and Computer Engineering Department, Politecnico di Torino, Torino, Italy.
⁷⁰ National University of Computer & Emerging Sciences, Islamabad, Pakistan.
⁷¹ Anacleto Lab, Dipartimento di informatica, Università degli Studi di Milano, Milan, Italy.
⁷² Applied Bioinformatics, Bioscience, Wageningen University and Research Centre, Wageningen, Netherlands.
⁷³ Biometris, Wageningen University, Wageningen, Netherlands.
⁷⁴ Center for Multidisciplinary Research, Institute of Nuclear Sciences Vinca, University of Belgrade, Belgrade, Serbia.
⁷⁵ Department of Computing and Mathematics FFCLRP-USP, University of Sao Paulo, Ribeirao Preto, Brazil.
⁷⁶ Institute for Molecular Infection Biology, University of Würzburg, Würzburg, Germany.
⁷⁷ Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA.
⁷⁸ University of Southern Mississippi, Hattiesburg, MS, USA.
⁷⁹ Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, UK.
⁸⁰ School of Biosciences, University of Kent, Canterbury, Kent, UK.
⁸¹ Institut für Medizinische Genetik und Humangenetik, Charité - Universitätsmedizin Berlin, Berlin, Germany.
⁸² Department of Electrical Engineering ESAT-SCD and IBBT-KU Leuven Future Health Department, Katholieke Universiteit Leuven, Leuven, Belgium.
⁸³ California Institute for Quantitative Biosciences, University of California San Francisco, San Francisco, CA, USA.
⁸⁴ Department of Chemical Biology, The Hebrew University of Jerusalem, Jerusalem, Israel.
⁸⁵ Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA, USA.
⁸⁶ Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA.
⁸⁷ Department of Microbiology, Miami University, Oxford, OH, USA. idoerg@iastate.edu.
⁸⁸ Department of Computer Science, Miami University, Oxford, OH, USA. idoerg@iastate.edu.
⁸⁹ Department of Computer Science and Informatics, Indiana University, Bloomington, IN, USA. predrag@indiana.edu.

PMID: 27604469
PMCID: PMC5015320
DOI: 10.1186/s13059-016-1037-6

An expanded evaluation of protein function prediction methods shows an improvement in accuracy

Yuxiang Jiang et al. Genome Biol. 2016.

. 2016 Sep 7;17(1):184.

doi: 10.1186/s13059-016-1037-6.

Authors

Affiliations

¹ Department of Computer Science and Informatics, Indiana University, Bloomington, IN, USA.
² Buck Institute for Research on Aging, Novato, CA, USA.
³ Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA.
⁴ Department of Microbiology, Miami University, Oxford, OH, USA.
⁵ University of Rome, La Sapienza, Rome, Italy.
⁶ Computational Bioscience Program, University of Colorado School of Medicine, Aurora, CO, USA.
⁷ Department of Computer Science, Colorado State University, Fort Collins, CO, USA.
⁸ Department of Computing and Information Systems, University of Melbourne, Parkville, Victoria, Australia.
⁹ Health and Biomedical Informatics Centre, University of Melbourne, Parkville, Victoria, Australia.
¹⁰ Department of Biology, New York University, New York, NY, USA.
¹¹ Social Media and Political Participation Lab, New York University, New York, NY, USA.
¹² CY Data Science, New York, NY, USA.
¹³ Department of Computer Science, New York University, New York, NY, USA.
¹⁴ Simons Center for Data Analysis, New York, NY, USA.
¹⁵ Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA.
¹⁶ Department of Electrical Engineering and Computer Sciences, University of California Berkeley, Berkeley, CA, USA.
¹⁷ Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA, USA.
¹⁸ Biocomputing Group, BiGeA, University of Bologna, Bologna, Italy.
¹⁹ Computer Science Department, University of Missouri, Columbia, MO, USA.
²⁰ ETH Zurich, Zurich, Switzerland.
²¹ Swiss Institute of Bioinformatics, Zurich, Switzerland.
²² Bioinformatics Group, Department of Computer Science, University College London, London, UK.
²³ University of Lausanne, Lausanne, Switzerland.
²⁴ Swiss Institute of Bioinformatics, Lausanne, Switzerland.
²⁵ European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK.
²⁶ Department of Information Technology, University of Turku, Turku, Finland.
²⁷ University of Turku Graduate School, University of Turku, Turku, Finland.
²⁸ Turku Centre for Computer Science, Turku, Finland.
²⁹ University of Bristol, Bristol, UK.
³⁰ Institute of Biotechnology, University of Helsinki, Helsinki, Finland.
³¹ Department of Biological and Environmental Sciences, Universitity of Helsinki, Helsinki, Finland.
³² Institute of Information Science, Academia Sinica, Taipei, Taiwan.
³³ Department of Computational Science and Engineering, North Carolina A&T State University, Greensboro, NC, USA.
³⁴ Department of Computer Science, Purdue University, West Lafayette, IN, USA.
³⁵ Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
³⁶ Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel.
³⁷ School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel.
³⁸ Centre for Cardiovascular Genetics, Institute of Cardiovascular Science, University College London, London, UK.
³⁹ Department of Electrical Engineering, STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven, Leuven, Belgium.
⁴⁰ iMinds Department Medical Information Technologies, Leuven, Belgium.
⁴¹ Inserm UMR-S1052, CNRS UMR5286, Cancer Research Centre of Lyon, Lyon, France.
⁴² Université de Lyon 1, Villeurbanne, France.
⁴³ Centre Léon Bérard, Lyon, France.
⁴⁴ Institute of Structural and Molecular Biology, University College London, London, UK.
⁴⁵ Cerenode Inc., Boston, MA, USA.
⁴⁶ Molde University College, Molde, Norway.
⁴⁷ Department of Computer Science, Centre for Systems and Synthetic Biology, Royal Holloway University of London, Egham, UK.
⁴⁸ Department of Molecular, Cell and Developmental Biology, University of California at Los Angeles, Los Angeles, CA, USA.
⁴⁹ School of Mathematics, Statistics and Applied Mathematics, National University of Ireland, Galway, Ireland.
⁵⁰ Stanley Institute for Cognitive Genomics Cold Spring Harbor Laboratory, New York, NY, USA.
⁵¹ Graduate Program in Bioinformatics, University of British Columbia, Vancouver, Canada.
⁵² Department of Psychiatry and Michael Smith Laboratories, University of British Columbia, Vancouver, Canada.
⁵³ Department for Bioinformatics and Computational Biology-I12, Technische Universität München, Garching, Germany.
⁵⁴ DOE Joint Genome Institute, Walnut Creek, CA, USA.
⁵⁵ Bioinformatics and Genomics, Centre for Genomic Regulation, Barcelona, Spain.
⁵⁶ Universitat Pompeu Fabra, Barcelona, Spain.
⁵⁷ Institució Catalana de Recerca i Estudis Avançats, Barcelona, Spain.
⁵⁸ Division of Electronics, Rudjer Boskovic Institute, Zagreb, Croatia.
⁵⁹ EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation, Barcelona, Spain.
⁶⁰ State Key Laboratory of Genetic Engineering, Collaborative Innovation Center of Genetics and Development, Department of Biostatistics and Computational Biology, School of Life Science, Fudan University, Shanghai, China.
⁶¹ Children's Hospital of Fudan University, Shanghai, China.
⁶² Department of Molecular Medicine, University of Padua, Padua, Italy.
⁶³ Research and Innovation Center, Edmund Mach Foundation, San Michele all'Adige, Italy.
⁶⁴ Department of Information Engineering, University of Padua, Padova, Italy.
⁶⁵ Department of Biomedical Sciences, University of Padua, Padova, Italy.
⁶⁶ Instituto De Genetica Medica y Molecular, Hospital Universitario de La Paz, Madrid, Spain.
⁶⁷ Spanish National Bioinformatics Institute, Spanish National Cancer Research Institute, Madrid, Spain.
⁶⁸ Structural and Computational Biology Programme, Spanish National Cancer Research Institute, Madrid, Spain.
⁶⁹ Control and Computer Engineering Department, Politecnico di Torino, Torino, Italy.
⁷⁰ National University of Computer & Emerging Sciences, Islamabad, Pakistan.
⁷¹ Anacleto Lab, Dipartimento di informatica, Università degli Studi di Milano, Milan, Italy.
⁷² Applied Bioinformatics, Bioscience, Wageningen University and Research Centre, Wageningen, Netherlands.
⁷³ Biometris, Wageningen University, Wageningen, Netherlands.
⁷⁴ Center for Multidisciplinary Research, Institute of Nuclear Sciences Vinca, University of Belgrade, Belgrade, Serbia.
⁷⁵ Department of Computing and Mathematics FFCLRP-USP, University of Sao Paulo, Ribeirao Preto, Brazil.
⁷⁶ Institute for Molecular Infection Biology, University of Würzburg, Würzburg, Germany.
⁷⁷ Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA.
⁷⁸ University of Southern Mississippi, Hattiesburg, MS, USA.
⁷⁹ Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, UK.
⁸⁰ School of Biosciences, University of Kent, Canterbury, Kent, UK.
⁸¹ Institut für Medizinische Genetik und Humangenetik, Charité - Universitätsmedizin Berlin, Berlin, Germany.
⁸² Department of Electrical Engineering ESAT-SCD and IBBT-KU Leuven Future Health Department, Katholieke Universiteit Leuven, Leuven, Belgium.
⁸³ California Institute for Quantitative Biosciences, University of California San Francisco, San Francisco, CA, USA.
⁸⁴ Department of Chemical Biology, The Hebrew University of Jerusalem, Jerusalem, Israel.
⁸⁵ Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA, USA.
⁸⁶ Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA.
⁸⁷ Department of Microbiology, Miami University, Oxford, OH, USA. idoerg@iastate.edu.
⁸⁸ Department of Computer Science, Miami University, Oxford, OH, USA. idoerg@iastate.edu.
⁸⁹ Department of Computer Science and Informatics, Indiana University, Bloomington, IN, USA. predrag@indiana.edu.

PMID: 27604469
PMCID: PMC5015320
DOI: 10.1186/s13059-016-1037-6

Abstract

Background: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging.

Results: We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2.

Conclusions: The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent.

Keywords: Disease gene prioritization; Protein function prediction.

PubMed Disclaimer

Figures

**Fig. 1**
Time line for the CAFA2 experiment

**Fig. 2**
CAFA2 benchmark breakdown. a The benchmark size for each of the four ontologies. b Breakdown of benchmarks for both types over 11 species (with no less than 15 proteins) sorted according to the total number of benchmark proteins. For both panels, *dark colors* (*blue*, *red*, and *yellow*) correspond to no-knowledge (NK) types, while their *light color* counterparts correspond to limited-knowledge (LK) types. The distributions of information contents corresponding to the benchmark sets are shown in Additional file 1. The size of CAFA 1 benchmarks are shown in *gray*. *BPO* Biological Process Ontology, *CCO* Cellular Component Ontology, *HPO* Human Phenotype Ontology, LK limited-knowledge, *MFO* Molecular Function Ontology, NK no-knowledge

**Fig. 3**
CAFA1 versus CAFA2 (*top methods*). A comparison in F _max between the top-five CAFA1 models against the top-five CAFA2 models. *Colored boxes* encode the results such that (1) the colors indicate margins of a CAFA2 method over a CAFA1 method in F _max and (2) the numbers in the *box* indicate the percentage of wins. For both the Molecular Function Ontology (a) and Biological Process Ontology (b) results: A CAFA1 top-five models (*rows, from top to bottom*) against CAFA2 top-five models (*columns, from left to right*). B Comparison of Naïve baselines trained respectively on SwissProt2011 and SwissProt2014. C Comparison of BLAST baselines trained on SwissProt2011 and SwissProt2014

**Fig. 4**
Overall evaluation using the maximum F measure, F _max. Evaluation was carried out on no-knowledge benchmark sequences in the full mode. The coverage of each method is shown within its performance bar. A perfect predictor would be characterized with F _max=1. Confidence intervals (95 %) were determined using bootstrapping with 10,000 iterations on the set of benchmark sequences. For cases in which a principal investigator participated in multiple teams, the results of only the best-scoring method are presented. Details for all methods are provided in Additional file 1

**Fig. 5**
Precision–recall curves for top-performing methods. Evaluation was carried out on no-knowledge benchmark sequences in the full mode. A perfect predictor would be characterized with F _max=1, which corresponds to the point (1,1) in the precision–recall plane. For cases in which a principal investigator participated in multiple teams, the results of only the best-scoring method are presented

**Fig. 6**
Overall evaluation using the minimum semantic distance, S _min. Evaluation was carried out on no-knowledge benchmark sequences in the full mode. The coverage of each method is shown within its performance bar. A perfect predictor would be characterized with S _min=0. Confidence intervals (95 %) were determined using bootstrapping with 10,000 iterations on the set of benchmark sequences. For cases in which a principal investigator participated in multiple teams, the results of only the best-scoring method are presented. Details for all methods are provided in Additional file 1

**Fig. 7**
Overall evaluation using the averaged AUC over terms with no less than ten positive annotations. The evaluation was carried out on no-knowledge benchmark sequences in the full mode. Error bars indicate the standard error in averaging AUC over terms for each method. For cases in which a principal investigator participated in multiple teams, the results of only the best-scoring method are presented. Details for all methods are provided in Additional file 1. *AUC* receiver operating characteristic curve

**Fig. 8**
Averaged AUC per term for Human Phenotype Ontology. a Terms are sorted based on AUC. The *dashed red line* indicates the performance of the Naïve method. b The top-ten accurately predicted terms without overlapping ancestors (except for the root). *AUC* receiver operating characteristic curve

**Fig. 9**
Performance evaluation using the maximum F measure, F _max, on eukaryotic (*left*) versus prokaryotic (*right*) benchmark sequences. The evaluation was carried out on no-knowledge benchmark sequences in the full mode. The coverage of each method is shown within its performance bar. Confidence intervals (95 %) were determined using bootstrapping with 10,000 iterations on the set of benchmark sequences. For cases in which a principal investigator participated in multiple teams, the results of only the best-scoring method are presented. Details for all methods are provided in Additional file 1

**Fig. 10**
Similarity network of participating methods for BPO. Similarities are computed as Pearson’s correlation coefficient between methods, with a 0.75 cutoff for illustration purposes. A unique color is assigned to all methods submitted under the same principal investigator. Not evaluated (organizers’) methods are shown in *triangles*, while benchmark methods (Naïve and BLAST) are shown in *squares*. The top-ten methods are highlighted with enlarged nodes and *circled in red*. The edge width indicates the strength of similarity. Nodes are labeled with the name of the methods followed by “-team(model)” if multiple teams/models were submitted

**Fig. 11**
Case study on the human *ADAM-TS12* gene. Biological process terms associated with *ADAM-TS12* gene in the union of the three databases by September 2014. The entire functional annotation of *ADAM-TS12* consists of 89 terms, 28 of which are shown. Twelve terms, marked in *green*, are leaf terms. This directed acyclic graph was treated as ground truth in the CAFA2 assessment. *Solid black lines* provide direct “is a” or “part of” relationships between terms, while *gray lines* mark indirect relationships (that is, some terms were not drawn in this picture). Predicted terms of the top-five methods and two baseline methods were picked at their optimal F _max threshold. Over-predicted terms are not shown

See this image and copyright information in PMC

References

1. Costello JC, Stolovitzky G. Seeking the wisdom of crowds through challenge-based competitions in biomedical research. Clin Pharmacol Ther. 2013;93(5):396–8. doi: 10.1038/clpt.2013.36. - DOI - PubMed
1. Radivojac P, Clark WT, Oron TR, Schnoes AM, Wittkop T, Sokolov A, et al. A large-scale evaluation of computational protein function prediction. Nat Methods. 2013;10(3):221–7. doi: 10.1038/nmeth.2340. - DOI - PMC - PubMed
1. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene Ontology: tool for the unification of biology. Nat Genet. 2000;25(1):25–9. doi: 10.1038/75556. - DOI - PMC - PubMed
1. Dessimoz C, Skunca N, Thomas PD. CAFA and the open world of protein function predictions. Trends Genet. 2013;29(11):609–10. doi: 10.1016/j.tig.2013.09.005. - DOI - PubMed
1. Gillis J, Pavlidis P. Characterizing the state of the art in the computational assignment of gene function: lessons from the first critical assessment of functional annotation (CAFA) BMC Bioinform. 2013;14(Suppl 3):15. doi: 10.1186/1471-2105-14-S3-S15. - DOI - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Associated data

figshare/10.6084/m9.figshare.2059944.v1

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

An expanded evaluation of protein function prediction methods shows an improvement in accuracy

Affiliations

An expanded evaluation of protein function prediction methods shows an improvement in accuracy

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Substances

Associated data

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources