Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Nov 19:10:108.
doi: 10.1186/s40793-015-0101-2. eCollection 2015.

Annotation inconsistencies beyond sequence similarity-based function prediction - phylogeny and genome structure

Affiliations

Annotation inconsistencies beyond sequence similarity-based function prediction - phylogeny and genome structure

Vasilis J Promponas et al. Stand Genomic Sci. .

Abstract

The function annotation process in computational biology has increasingly shifted from the traditional characterization of individual biochemical roles of protein molecules to the system-wide detection of entire metabolic pathways and genomic structures. The so-called genome-aware methods broaden misannotation inconsistencies in genome sequences beyond protein function assignments, encompassing phylogenetic anomalies and artifactual genomic regions. We outline three categories of error propagation in databases by providing striking examples - at various levels of appreciation by the community from traditional to emerging, thus raising awareness for future solutions.

Keywords: Error propagation; Genome evolution; Genome structure; Genome-aware methods; Genome-wide annotation; Mis-annotation modeling; Next-generation sequencing; Protein function prediction.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Depiction of the relationships across eight families of 62 “putaitve” proteins. Network view of sequence similarities detected by BlastP [21], generated with BioLayout [22]. Six of the eight displayed families originate from a single genome project [23]
Fig. 2
Fig. 2
Phylogenetic distribution of nucleoporin Nup160 domains in Pfam. The collapsed eukaryotic tree with the distribution of 336 members is shown, along the bacterial branch containing two unexpected entries with 3 members (underlined by a purple oval box). These phylogenetic anomalies are present both in Pfam (PF11715) [24], as well as the corresponding UniProt entries [14]. The presence of other domains is also shown
Fig. 3
Fig. 3
Domain organization for two unique instances of multi-domain architectures for Y-Nups. The arginase-Nup133 (Nucleoporin_C) fusion is accompanied by a Nup170-like domain in the middle (green) [top]. The aconitase-Nup75 (Nup85) fusion also contains a number of other regions of interest [bottom]. For details, please refer to the corresponding UniProt/Pfam entries, see main text for identifiers

References

    1. Iliopoulos I, Tsoka S, Andrade MA, Enright AJ, Carroll M, Poullet P, Promponas V, Liakopoulos T, Palaios G, Pasquier C, et al. Evaluation of annotation strategies using an entire genome sequence. Bioinformatics. 2003;19(6):717–26. doi: 10.1093/bioinformatics/btg077. - DOI - PubMed
    1. Kyrpides NC, Ouzounis CA. Whole-genome sequence annotation: ‘Going wrong with confidence’. Mol Microbiol. 1999;32(4):886–7. doi: 10.1046/j.1365-2958.1999.01380.x. - DOI - PubMed
    1. Ouzounis CA, Karp PD. The past, present and future of genome-wide re-annotation. Genome Biol. 2002;3(2):COMMENT2001. doi: 10.1186/gb-2002-3-2-comment2001. - DOI - PMC - PubMed
    1. Green ML, Karp PD. Genome annotation errors in pathway databases due to semantic ambiguity in partial EC numbers. Nucleic Acids Res. 2005;33(13):4035–9. doi: 10.1093/nar/gki711. - DOI - PMC - PubMed
    1. Gilks WR, Audit B, De Angelis D, Tsoka S, Ouzounis CA. Modeling the percolation of annotation errors in a database of protein sequences. Bioinformatics. 2002;18(12):1641–9. doi: 10.1093/bioinformatics/18.12.1641. - DOI - PubMed

LinkOut - more resources