. 2010 Jul 15;26(14):1708-13.

doi: 10.1093/bioinformatics/btq270. Epub 2010 May 26.

Threshold Average Precision (TAP-k): a measure of retrieval designed for bioinformatics

Hyrum D Carroll¹, Maricel G Kann, Sergey L Sheetlin, John L Spouge

Affiliations

PMID: 20505002
PMCID: PMC2894514
DOI: 10.1093/bioinformatics/btq270

Threshold Average Precision (TAP-k): a measure of retrieval designed for bioinformatics

Hyrum D Carroll et al. Bioinformatics. 2010.

. 2010 Jul 15;26(14):1708-13.

doi: 10.1093/bioinformatics/btq270. Epub 2010 May 26.

Authors

Hyrum D Carroll¹, Maricel G Kann, Sergey L Sheetlin, John L Spouge

Affiliation

¹ National Center for Biotechnology Information, Bethesda, MD 20894, USA.

PMID: 20505002
PMCID: PMC2894514
DOI: 10.1093/bioinformatics/btq270

Abstract

Motivation: Since database retrieval is a fundamental operation, the measurement of retrieval efficacy is critical to progress in bioinformatics. This article points out some issues with current methods of measuring retrieval efficacy and suggests some improvements. In particular, many studies have used the pooled receiver operating characteristic for n irrelevant records (ROC(n)) score, the area under the ROC curve (AUC) of a 'pooled' ROC curve, truncated at n irrelevant records. Unfortunately, the pooled ROC(n) score does not faithfully reflect actual usage of retrieval algorithms. Additionally, a pooled ROC(n) score can be very sensitive to retrieval results from as little as a single query.

Methods: To replace the pooled ROC(n) score, we propose the Threshold Average Precision (TAP-k), a measure closely related to the well-known average precision in information retrieval, but reflecting the usage of E-values in bioinformatics. Furthermore, in addition to conditions previously given in the literature, we introduce three new criteria that an ideal measure of retrieval efficacy should satisfy.

Results: PSI-BLAST, GLOBAL, HMMER and RPS-BLAST provided examples of using the TAP-k and pooled ROC(n) scores to evaluate sequence retrieval algorithms. In particular, compelling examples using real data highlight the drawbacks of the pooled ROC(n) score, showing that it can produce evaluations skewing far from intuitive expectations. In contrast, the TAP-k satisfies most of the criteria desired in an ideal measure of retrieval efficacy.

Availability and implementation: The TAP-k web server and downloadable Perl script are freely available at http://www.ncbi.nlm.nih.gov/CBBresearch/Spouge/html.ncbi/tap/

PubMed Disclaimer

Figures

**Fig. 1.**
An example of a PR graph and TAP curve. The E-values at each point are represented by the colors on the bar beneath.

**Fig. 2.**
Example retrieval list with relevant (blue ‘R’s) and irrelevant (red ‘I’s) records illustrating the j(E₀)-th relevant record, TPIRs and the sentinel record.

**Fig. 3.**
The distribution of EPQ versus E-value for PSI-BLAST retrieval over all queries in DB_344_Pfam. The dashed green line indicates the mean EPQ; the solid red line, the median EPQ; the top and bottom of the blue boxes, the first and third quartiles of the EPQ distribution; and the top and bottom whiskers, the maximum and minimum EPQ over all queries.

**Fig. 4.**
PSI-BLAST retrieval results for the homoserine dehydrogenase Pfam family searching in DB_344_Pfam. (a) Individual ROC₅₀ curves, along with the corresponding pooled ROC₅₀. Note that the pooled ROC curve is lower than both of the queries. This same condition continues until 203 irrelevant records. (b) PR curves (and their average) for the same retrieval results. The TAP for each is the AP (with the precision of last record repeated).

**Fig. 5.**
TAP curves against the E-value threshold E₀, for searching DB_331_CDD with each query from DB_8920_PDB in turn. Retrieval results for GLOBAL are represented with a solid green line; for HMMER_semi-global, with a long-dashed blue line; for HMMER_local, with a medium-dashed red line; and for RPS-BLAST, with a dotted black line. The arrows indicate the maximum TAP and its E-value threshold E₀*(A) for each algorithm A.

See this image and copyright information in PMC

References

1. Bamber D. Area above ordinal dominance graph and area below receiver operating characteristic graph. J. Math. Psychol. 1975;12:387–415.
1. Berman H, et al. The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res. 2007;35:D301–303. - PMC - PubMed
1. Brenner SE, et al. Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. Proc. Natl Acad. Sci. USA. 1998;95:6073–6078. - PMC - PubMed
1. Chen Z. Assessing sequence comparison methods with the average precision criterion. Bioinformatics. 2003;19:2456–2460. - PubMed
1. Davis J, Goadrich M. Proceedings of the 23rd International Conference on Machine learning. Madison, Wisconsin, USA: ACM; 2006. The Relationship Between Precision-Recall and ROC Curves; pp. 233–240.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Threshold Average Precision (TAP-k): a measure of retrieval designed for bioinformatics

Affiliation

Threshold Average Precision (TAP-k): a measure of retrieval designed for bioinformatics

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials

Miscellaneous