Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Jan;41(Database issue):D152-6.
doi: 10.1093/nar/gks1062. Epub 2012 Nov 17.

GeneTack database: genes with frameshifts in prokaryotic genomes and eukaryotic mRNA sequences

Affiliations

GeneTack database: genes with frameshifts in prokaryotic genomes and eukaryotic mRNA sequences

Ivan Antonov et al. Nucleic Acids Res. 2013 Jan.

Abstract

Database annotations of prokaryotic genomes and eukaryotic mRNA sequences pay relatively low attention to frame transitions that disrupt protein-coding genes. Frame transitions (frameshifts) could be caused by sequencing errors or indel mutations inside protein-coding regions. Other observed frameshifts are related to recoding events (that evolved to control expression of some genes). Earlier, we have developed an algorithm and software program GeneTack for ab initio frameshift finding in intronless genes. Here, we describe a database (freely available at http://topaz.gatech.edu/GeneTack/db.html) containing genes with frameshifts (fs-genes) predicted by GeneTack. The database includes 206 991 fs-genes from 1106 complete prokaryotic genomes and 45 295 frameshifts predicted in mRNA sequences from 100 eukaryotic genomes. The whole set of fs-genes was grouped into clusters based on sequence similarity between fs-proteins (conceptually translated fs-genes), conservation of the frameshift position and frameshift direction (-1, +1). The fs-genes can be retrieved by similarity search to a given query sequence via a web interface, by fs-gene cluster browsing, etc. Clusters of fs-genes are characterized with respect to their likely origin, such as pseudogenization, phase variation, etc. The largest clusters contain fs-genes with programed frameshifts (related to recoding events).

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The GeneTack database entries: fs-genes predicted in the genome of Escherichia coli str. K-12 substr. DH10B. FS_ID—unique fs-gene identificator, Coord—frameshift coordinate in the input sequence, D—frameshift direction (+1 or −1), GeneL coordinate of left border of fs-gene (gene start for ‘+’ strand, gene end for ‘−’ strand), GeneR—coordinate of right border of fs-gene (gene end for ‘+’ strand, gene start for ‘−’ strand), S—the fs-gene strand, F—frameshift coordinate in fragment (the sequence used as input to GeneTack), G—frameshift coordinate in fs-gene, P—frameshift coordinate in fs-protein, BLASTp—information on the BLASTp hit covering frameshift position in the fs-protein, Pfam—information on the Pfam domain covering frameshift position in the fs-protein, COF—cluster ID (if available), RBS—RBS score of the downstream gene defined by GeneMarkS.
Figure 2.
Figure 2.
Logo of the conserved motif (upper panel) and distribution of coordinates of frameshifts (lower panel) in 428 fs-genes of Release Factor 2 collected in a cluster (ID 474411093) (13). Red bars in the lower panel correspond to frameshift positions and green bars show the total length of fs-proteins. The small green bars indicate existence of subgroups of longer fs-proteins.

References

    1. Antonov I, Borodovsky M. GeneTack: frameshift identification in protein-coding sequences by the Viterbi algorithm. J. Bioinformatics Comput. Biol. 2010;8:535. - PubMed
    1. Medigue C, Rose M, Viari A, Danchin A. Detecting and analyzing DNA sequencing errors: toward a higher quality of the Bacillus subtilis genome sequence. Genome Res. 1999;9:1116–1127. - PMC - PubMed
    1. Deshayes C, Perrodou E, Gallien S, Euphrasie D, Schaeffer C, Van-Dorsselaer A, Poch O, Lecompte O, Reyrat JM. Interrupted coding sequences in Mycobacterium smegmatis: authentic mutations or sequencing errors? Genome Biol. 2007;8:R20. - PMC - PubMed
    1. Baranov PV, Gesteland RF, Atkins JF. Recoding: translational bifurcations in gene expression. Gene. 2002;286:187–201. - PubMed
    1. Namy O, Rousset JP, Napthine S, Brierley I. Reprogrammed genetic decoding in cellular gene expression. Mol. Cell. 2004;13:157–168. - PubMed

Publication types