Threshold Average Precision (TAP-k): a measure of retrieval designed for bioinformatics
- PMID: 20505002
- PMCID: PMC2894514
- DOI: 10.1093/bioinformatics/btq270
Threshold Average Precision (TAP-k): a measure of retrieval designed for bioinformatics
Abstract
Motivation: Since database retrieval is a fundamental operation, the measurement of retrieval efficacy is critical to progress in bioinformatics. This article points out some issues with current methods of measuring retrieval efficacy and suggests some improvements. In particular, many studies have used the pooled receiver operating characteristic for n irrelevant records (ROC(n)) score, the area under the ROC curve (AUC) of a 'pooled' ROC curve, truncated at n irrelevant records. Unfortunately, the pooled ROC(n) score does not faithfully reflect actual usage of retrieval algorithms. Additionally, a pooled ROC(n) score can be very sensitive to retrieval results from as little as a single query.
Methods: To replace the pooled ROC(n) score, we propose the Threshold Average Precision (TAP-k), a measure closely related to the well-known average precision in information retrieval, but reflecting the usage of E-values in bioinformatics. Furthermore, in addition to conditions previously given in the literature, we introduce three new criteria that an ideal measure of retrieval efficacy should satisfy.
Results: PSI-BLAST, GLOBAL, HMMER and RPS-BLAST provided examples of using the TAP-k and pooled ROC(n) scores to evaluate sequence retrieval algorithms. In particular, compelling examples using real data highlight the drawbacks of the pooled ROC(n) score, showing that it can produce evaluations skewing far from intuitive expectations. In contrast, the TAP-k satisfies most of the criteria desired in an ideal measure of retrieval efficacy.
Availability and implementation: The TAP-k web server and downloadable Perl script are freely available at http://www.ncbi.nlm.nih.gov/CBBresearch/Spouge/html.ncbi/tap/
Figures





Similar articles
-
Biotool2Web: creating simple Web interfaces for bioinformatics applications.Appl Bioinformatics. 2006;5(1):63-6. doi: 10.2165/00822942-200605010-00009. Appl Bioinformatics. 2006. PMID: 16539540
-
BIAS: Bioinformatics Integrated Application Software.Bioinformatics. 2005 Apr 15;21(8):1745-6. doi: 10.1093/bioinformatics/bti170. Epub 2004 Nov 30. Bioinformatics. 2005. PMID: 15572471
-
BioDownloader: bioinformatics downloads and updates in a few clicks.Bioinformatics. 2007 Jun 1;23(11):1437-9. doi: 10.1093/bioinformatics/btm120. Epub 2007 May 5. Bioinformatics. 2007. PMID: 17483505
-
Interoperability with Moby 1.0--it's better than sharing your toothbrush!Brief Bioinform. 2008 May;9(3):220-31. doi: 10.1093/bib/bbn003. Epub 2008 Jan 31. Brief Bioinform. 2008. PMID: 18238804 Review.
-
Automation of in-silico data analysis processes through workflow management systems.Brief Bioinform. 2008 Jan;9(1):57-68. doi: 10.1093/bib/bbm056. Epub 2007 Dec 2. Brief Bioinform. 2008. PMID: 18056132 Review.
Cited by
-
Ranking relations between diseases, drugs and genes for a curation task.J Biomed Semantics. 2012 Oct 5;3 Suppl 3(Suppl 3):S5. doi: 10.1186/2041-1480-3-S3-S5. Epub 2012 Oct 5. J Biomed Semantics. 2012. PMID: 23046495 Free PMC article.
-
Overview of the BioCreative III Workshop.BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S1. doi: 10.1186/1471-2105-12-S8-S1. BMC Bioinformatics. 2011. PMID: 22151647 Free PMC article.
-
Improving Retrieval Efficacy of Homology Searches Using the False Discovery Rate.IEEE/ACM Trans Comput Biol Bioinform. 2015 May-Jun;12(3):531-7. doi: 10.1109/TCBB.2014.2366112. IEEE/ACM Trans Comput Biol Bioinform. 2015. PMID: 26357264 Free PMC article.
-
Soft tagging of overlapping high confidence gene mention variants for cross-species full-text gene normalization.BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S6. doi: 10.1186/1471-2105-12-S8-S6. BMC Bioinformatics. 2011. PMID: 22152021 Free PMC article.
-
SR4GN: a species recognition software tool for gene normalization.PLoS One. 2012;7(6):e38460. doi: 10.1371/journal.pone.0038460. Epub 2012 Jun 5. PLoS One. 2012. PMID: 22679507 Free PMC article.
References
-
- Bamber D. Area above ordinal dominance graph and area below receiver operating characteristic graph. J. Math. Psychol. 1975;12:387–415.
-
- Chen Z. Assessing sequence comparison methods with the average precision criterion. Bioinformatics. 2003;19:2456–2460. - PubMed
-
- Davis J, Goadrich M. Proceedings of the 23rd International Conference on Machine learning. Madison, Wisconsin, USA: ACM; 2006. The Relationship Between Precision-Recall and ROC Curves; pp. 233–240.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Research Materials
Miscellaneous