Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 May 29;15(1):411.
doi: 10.1186/1471-2164-15-411.

Predicting the fungal CUG codon translation with Bagheera

Affiliations

Predicting the fungal CUG codon translation with Bagheera

Stefanie Mühlhausen et al. BMC Genomics. .

Abstract

Background: Many eukaryotes have been shown to use alternative schemes to the universal genetic code. While most Saccharomycetes, including Saccharomyces cerevisiae, use the standard genetic code translating the CUG codon as leucine, some yeasts, including many but not all of the "Candida", translate the same codon as serine. It has been proposed that the change in codon identity was accomplished by an almost complete loss of the original CUG codons, making the CUG positions within the extant species highly discriminative for the one or other translation scheme.

Results: In order to improve the prediction of genes in yeast species by providing the correct CUG decoding scheme we implemented a web server, called Bagheera, that allows determining the most probable CUG codon translation for a given transcriptome or genome assembly based on extensive reference data. As reference data we use 2071 manually assembled and annotated sequences from 38 cytoskeletal and motor proteins belonging to 79 yeast species. The web service includes a pipeline, which starts with predicting and aligning homologous genes to the reference data. CUG codon positions within the predicted genes are analysed with respect to amino acid similarity and CUG codon conservation in related species. In addition, the tRNACAG gene is predicted in genomic data and compared to known leu-tRNACAG and ser-tRNACAG genes. Bagheera can also be used to evaluate any mRNA and protein sequence data with the codon usage of the respective species. The usage of the system has been demonstrated by analysing six genomes not included in the reference data.

Conclusions: Gene prediction and consecutive comparison with reference data from other Saccharomycetes are sufficient to predict the most probable decoding scheme for CUG codons. This approach has been implemented into Bagheera (http://www.motorprotein.de/bagheera).

PubMed Disclaimer

Figures

Figure 1
Figure 1
Workflow of the Bagheera web application. A) Upon uploading of the yeast genome or transcriptome assembly data homologous proteins to the reference sequences are identified using TBLASTN and subsequently predicted by AUGUSTUS-PPX. The reference sequences used for the gene prediction are selected according to the species selected as model organism for AUGUSTUS. The predicted proteins are aligned to the reference alignments (NW = Needleman-Wunsch, SW = Swith-Waterman, LCS = Longest Common Subsequence) and the codon usage predicted based on the analysis of sequence similarity and CUG codon conservation at CUG codon positions. Optionally, a phylogenetic tree can be calculated based on a randomly selected and concatenated subset of the predicted proteins. B) A gene reconstruction of the uploaded protein sequence is performed to obtain cDNA sequence. The species encoding the uploaded protein has to be specified. The cDNA sequence is then translated according to the translation scheme of the respective species.
Figure 2
Figure 2
Screenshot of the web interface. The web interface is divided into three main parts: data upload and options section, results section, and phylogenetic tree section (not shown). A) Example data were uploaded and processed with default parameters. B) The results section is split into a summary and a section listing each reference protein and a detailed analysis of each predicted protein down to single CUG codons. For every reference protein, the predicted gene and, if applicable, the respective CUG positions are shown. For every predicted CUG position, which could be mapped onto the reference data, the amino acid composition and CUG codon usage at the respective positions in the reference data are listed. The predicted actin related protein class 4 (Arp4) contains one CUG at position 163. This position corresponds to alignment position 291 in the reference alignment. It is here indicated by a black box. All CUG codons are noted as leucine in the predicted sequence, regardless the suggested codon usage.
Figure 3
Figure 3
Number of CUG codons in the reference data. The total number of CUG positions for every set of reference proteins is shown together with the numbers of CUG positions conserved in at least two and five genes. To account for different protein lengths (e.g. 200 amino acids in dynactin3 p24 proteins compared to up to 4,000 amino acids in dynein heavy chain proteins), the total number of CUG positions per 1,000 amino acids is also plotted showing that CUG codons are not particularly enriched in certain protein families. Values for all species using standard codon usage (left side) are contrasted with those for all species using alternative yeast codon usage (right side). Detailed numbers are available in Additional file 3.

Similar articles

Cited by

References

    1. Jukes TH, Osawa S, Muto A, Lehman N. Evolution of anticodons: variations in the genetic code. Cold Spring Harb Symp Quant Biol. 1987;52:769–776. doi: 10.1101/SQB.1987.052.01.086. - DOI - PubMed
    1. Jukes TH, Osawa S. Evolutionary changes in the genetic code. Comp Biochem Physiol B. 1993;106:489–494. doi: 10.1016/0300-9629(93)90243-W. - DOI - PubMed
    1. Osawa S, Jukes TH. Codon reassignment (codon capture) in evolution. J Mol Evol. 1989;28:271–278. doi: 10.1007/BF02103422. - DOI - PubMed
    1. Schultz DW, Yarus M. Transfer RNA mutation and the malleability of the genetic code. J Mol Biol. 1994;235:1377–1380. doi: 10.1006/jmbi.1994.1094. - DOI - PubMed
    1. Ohama T, Suzuki T, Mori M, Osawa S, Ueda T, Watanabe K, Nakase T. Non-universal decoding of the leucine codon CUG in several Candida species. Nucleic Acids Res. 1993;21:4039–4045. doi: 10.1093/nar/21.17.4039. - DOI - PMC - PubMed

Publication types