. 2007 Jul 1;23(13):i41-8.

doi: 10.1093/bioinformatics/btm229.

Manual curation is not sufficient for annotation of genomic databases

William A Baumgartner Jr¹, K Bretonnel Cohen, Lynne M Fox, George Acquaah-Mensah, Lawrence Hunter

Affiliations

PMID: 17646325
PMCID: PMC2516305
DOI: 10.1093/bioinformatics/btm229

Manual curation is not sufficient for annotation of genomic databases

William A Baumgartner Jr et al. Bioinformatics. 2007.

. 2007 Jul 1;23(13):i41-8.

doi: 10.1093/bioinformatics/btm229.

Authors

William A Baumgartner Jr¹, K Bretonnel Cohen, Lynne M Fox, George Acquaah-Mensah, Lawrence Hunter

Affiliation

¹ Center for Computational Pharmacology, University of Colorado School of Medicine, USA.

PMID: 17646325
PMCID: PMC2516305
DOI: 10.1093/bioinformatics/btm229

Abstract

Motivation: Knowledge base construction has been an area of intense activity and great importance in the growth of computational biology. However, there is little or no history of work on the subject of evaluation of knowledge bases, either with respect to their contents or with respect to the processes by which they are constructed. This article proposes the application of a metric from software engineering known as the found/fixed graph to the problem of evaluating the processes by which genomic knowledge bases are built, as well as the completeness of their contents.

Results: Well-understood patterns of change in the found/fixed graph are found to occur in two large publicly available knowledge bases. These patterns suggest that the current manual curation processes will take far too long to complete the annotations of even just the most important model organisms, and that at their current rate of production, they will never be sufficient for completing the annotation of all currently available proteomes.

PubMed Disclaimer

Figures

**Fig. 1**
Hypothetical found/fixed graphs depicting good (left) and nonterminating (right) development processes.

**Fig. 2**
GO annotation of *Drosophila* proteins in Swiss-Prot over time.

**Fig. 3**
GO annotation of mouse proteins in Swiss-Prot over time.

**Fig. 4**
*Function* comment fields for all proteins in Swiss-Prot over time.

**Fig. 5**
GO annotations for all proteins in Swiss-Prot while varying the threshold for the number of GO annotations. Three different threshold values are used (>0, >1 and >9), representing proteins with at least one, at least two, and at least ten GO annotations, respectively.

**Fig. 6**
GeneRIF assignment to human genes in Entrez Gene over time. For simplicity, each Entrez Gene record is counted when first created, and discontinued records were ignored.

**Fig. 7**
GeneRIF assignment to mouse genes in Entrez Gene over time. For simplicity, each Entrez Gene record is counted when first created, and discontinued records were ignored.

**Fig. 8**
GO annotation of *Drosophila* proteins in Swiss-Prot over time with linear, exponential, and logarithmic functions fitted to the gained-annotations line.

**Fig. 9**
GO annotation of mouse proteins in Swiss-Prot over time with functions fitted to the gained-annotations line.

**Fig. 10**
*Function* comments for all proteins in Swiss-Prot over time with functions fitted to the gained-annotations line.

**Fig. 11**
GO annotation of all proteins in Swiss-Prot, with functions fitted to the gained-annotations line.

**Fig. 12**
GeneRIF assignment to human genes in Entrez Gene over time, with functions fitted to the gained-annotations line.

**Fig. 13**
GeneRIF assignment to mouse genes in Entrez Gene over time, with functions fitted to the gained-annotations line.

See this image and copyright information in PMC

References

1. Acquaah-Mensah GK, Hunter L. Design and implementation of a knowledge-base for pharmacology; Proceedings of the 5th Annual Bio-Ontologies Meeting.2002.
1. Alterovitz G, et al. GO PaD: the Gene Ontology Partition Database. Nucleic Acids Res. 2007;35(Database issue):D322–D327. - PMC - PubMed
1. Baral C, et al. Collaborative curation of data from bio-medical texts and abstracts and its integration; Proceedings of the 2nd International Workshop on Data Integration in the Life Sciences; 2005. pp. 309–312.
1. Beizer B. Software Testing Techniques. 2nd. International Thomson Computer Press; 1990.
1. Beizer B. Black-Box Testing: Techniques for Functional Testing of Software and Systems. John Wiley and Sons: 1995.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Manual curation is not sufficient for annotation of genomic databases

Affiliation

Manual curation is not sufficient for annotation of genomic databases

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources