Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 May 22:8:170.
doi: 10.1186/1471-2105-8-170.

Estimating the annotation error rate of curated GO database sequence annotations

Affiliations

Estimating the annotation error rate of curated GO database sequence annotations

Craig E Jones et al. BMC Bioinformatics. .

Abstract

Background: Annotations that describe the function of sequences are enormously important to researchers during laboratory investigations and when making computational inferences. However, there has been little investigation into the data quality of sequence function annotations. Here we have developed a new method of estimating the error rate of curated sequence annotations, and applied this to the Gene Ontology (GO) sequence database (GOSeqLite). This method involved artificially adding errors to sequence annotations at known rates, and used regression to model the impact on the precision of annotations based on BLAST matched sequences.

Results: We estimated the error rate of curated GO sequence annotations in the GOSeqLite database (March 2006) at between 28% and 30%. Annotations made without use of sequence similarity based methods (non-ISS) had an estimated error rate of between 13% and 18%. Annotations made with the use of sequence similarity methodology (ISS) had an estimated error rate of 49%.

Conclusion: While the overall error rate is reasonably low, it would be prudent to treat all ISS annotations with caution. Electronic annotators that use ISS annotations as the basis of predictions are likely to have higher false prediction rates, and for this reason designers of these systems should consider avoiding ISS annotations where possible. Electronic annotators that use ISS annotations to make predictions should be viewed sceptically. We recommend that curators thoroughly review ISS annotations before accepting them as valid. Overall, users of curated sequence annotations from the GO database should feel assured that they are using a comparatively high quality source of information.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Insertion of annotation errors: Errors were randomly inserted into reference set annotations at a fixed error rate. The precision of reference set annotations for predicting the annotations of query sequences was determined, and the average precision at that error rate was recorded. This process was repeated 100 times for a given error rate value, after which the error rate was incremented. This process continued until data was obtained for artificially increased error rates of between 2% and 40%.

References

    1. Brenner SE. Errors in genome annotation. Trends in Genetics. 1999;15:132–133. doi: 10.1016/S0168-9525(99)01706-0. - DOI - PubMed
    1. Green ML, Karp PD. Genome annotation errors in pathway databases due to symantic ambiguity in partial EC numbers. Nucleic Acid Research. 2005;33:4035–4039. doi: 10.1093/nar/gki711. - DOI - PMC - PubMed
    1. Artamonova II, Frishman G, Gelfand MS, Frishman D. Mining sequence annotation databanks for association patterns. Bioinformatics. 2005;21:ii49–ii57. doi: 10.1093/bioinformatics/bti1206. - DOI - PubMed
    1. Galperin MY, Koonin EV. Sources of systematic error in functional annotation of genomes: domain rearrangement, non-orthologous gene displacement and operon disruption. In Silico Biology. 1998;1:55–67. - PubMed
    1. Linial M. How incorrect annotations evolve – the case of short ORFs. Trends in Biotechnology. 2003;21:298–300. doi: 10.1016/S0167-7799(03)00139-2. - DOI - PubMed

Publication types

MeSH terms