Modeling the percolation of annotation errors in a database of protein sequences

Walter R Gilks¹, Benjamin Audit, Daniela De Angelis, Sophia Tsoka, Christos A Ouzounis

Affiliations

PMID: 12490449
DOI: 10.1093/bioinformatics/18.12.1641

Modeling the percolation of annotation errors in a database of protein sequences

Walter R Gilks et al. Bioinformatics. 2002 Dec.

. 2002 Dec;18(12):1641-9.

doi: 10.1093/bioinformatics/18.12.1641.

Authors

Walter R Gilks¹, Benjamin Audit, Daniela De Angelis, Sophia Tsoka, Christos A Ouzounis

Affiliation

¹ Medical Research Council Biostatistics Unit, Cambridge, UK. wally.gilks@mrc-bsu.cam.ac.uk

PMID: 12490449
DOI: 10.1093/bioinformatics/18.12.1641

Abstract

Public sequence databases contain information on the sequence, structure and function of proteins. Genome sequencing projects have led to a rapid increase in protein sequence information, but reliable, experimentally verified, information on protein function lags a long way behind. To address this deficit, functional annotation in protein databases is often inferred by sequence similarity to homologous, annotated proteins, with the attendant possibility of error. Now, the functional annotation in these homologous proteins may itself have been acquired through sequence similarity to yet other proteins, and it is generally not possible to determine how the functional annotation of any given protein has been acquired. Thus the possibility of chains of misannotation arises, a process we term 'error percolation'. With some simple assumptions, we develop a dynamical probabilistic model for these misannotation chains. By exploring the consequences of the model for annotation quality it is evident that this iterative approach leads to a systematic deterioration of database quality.

PubMed Disclaimer

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

MC_U105260556/MRC_/Medical Research Council/United Kingdom

LinkOut - more resources

Full Text Sources
- Ovid Technologies, Inc.
- Silverchair Information Systems
Other Literature Sources
- H1 Connect - Access expert opinions and insights on biomedical research.

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Modeling the percolation of annotation errors in a database of protein sequences

Affiliation

Modeling the percolation of annotation errors in a database of protein sequences

Authors

Affiliation

Abstract

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources