Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Sep 18:9:381.
doi: 10.1186/1471-2105-9-381.

MetWAMer: eukaryotic translation initiation site prediction

Affiliations

MetWAMer: eukaryotic translation initiation site prediction

Michael E Sparks et al. BMC Bioinformatics. .

Abstract

Background: Translation initiation site (TIS) identification is an important aspect of the gene annotation process, requisite for the accurate delineation of protein sequences from transcript data. We have developed the MetWAMer package for TIS prediction in eukaryotic open reading frames of non-viral origin. MetWAMer can be used as a stand-alone, third-party tool for post-processing gene structure annotations generated by external computational programs and/or pipelines, or directly integrated into gene structure prediction software implementations.

Results: MetWAMer currently implements five distinct methods for TIS prediction, the most accurate of which is a routine that combines weighted, signal-based translation initiation site scores and the contrast in coding potential of sequences flanking TISs using a perceptron. Also, our program implements clustering capabilities through use of the k-medoids algorithm, thereby enabling cluster-specific TIS parameter utilization. In practice, our static weight array matrix-based indexing method for parameter set lookup can be used with good results in data sets exhibiting moderate levels of 5'-complete coverage.

Conclusion: We demonstrate that improvements in statistically-based models for TIS prediction can be achieved by taking the class of each potential start-methionine into account pending certain testing conditions, and that our perceptron-based model is suitable for the TIS identification task. MetWAMer represents a well-documented, extensible, and freely available software system that can be readily re-trained for differing target applications and/or extended with existing and novel TIS prediction methods, to support further research efforts in this area.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Extraction of training data. A genomic protein coding sequence is conceptually spliced into an open reading frame, which is extended at its 5'- and 3'-termini to render a maximal (non-stop) reading frame. For LLKR, WLLKR, and BAYES, only sequences comprising the immediate context of true and false TISs (defined as five bases upstream through three bases downstream of the ATG codon's adenine residue) are extracted for modeling the TIS signal. For flank-contrasting methods, both TIS contexts and flanking sequences (96 nt in length per flank) are extracted for training signal and content sensors, respectively. A minimal distance between true and false TISs of 105 nt is used.
Figure 2
Figure 2
TIS detection competency tests. Shown are two distinct testing scenarios for TIS identification competency in maximal, TIS-containing reading frames and in reading frames lacking a true TIS. In TIS-containing tests, three outcomes are possible: the system predicts the true TIS as the TIS for the gene (TP), it predicts a false TIS as the gene's TIS (FP), or it fails to predict any TIS for the gene (FN). In the non-TIS-containing scenario, the system either (correctly) refuses to predict a TIS for the gene (TN) or mislabels some in-frame ATG as a TIS (FP).
Figure 3
Figure 3
Receiver operating characteristic curves for the perceptron element of PFCWLLKR. The classifier was assessed on the task of distinguishing ATG codons as true or false TISs, under distinct parameter deployment strategies: the dotted curve denotes perceptron performance obtained under a priori-known cluster-specific parameter usage, the solid curve that from homogeneous parameter deployment, and the dashed curve from WAM-based parameter set indexing. A true positive is defined as a true TIS labeled as such, whereas a false positive denotes a false TIS labeled by the classifier as true. These plots were generated using the ROCR package [62].
Figure 4
Figure 4
Cluster-specific TIS mononucleotide distributions. Sequence logo plots [63], depicting site-specific nucleotide abundances, were generated for TIS sequences obtained from clusters 1 through 3 using the WebLogo utility [64]. The medoids computed by the k-medoids algorithm for clusters 1 through 3 are TAAAAATGGAT, AAAAAATGGCG, and CAACAATGGCT, respectively.

Similar articles

Cited by

References

    1. Kozak M. How do eucaryotic ribosomes select initiation regions in messenger RNA? Cell. 1978;15:1109–1123. - PubMed
    1. Preiss T, Hentze M. Starting the protein synthesis machine: eukaryotic translation initiation. BioEssays. 2003;25:1201–1211. - PubMed
    1. Kozak M. An analysis of 5'-noncoding sequences from 699 vertebrate messenger RNAs. Nucleic Acids Research. 1987;15:8125–8148. - PMC - PubMed
    1. Sachs A, Sarnow P, Hentze M. Starting at the beginning, middle, and end: translation initiation in eukaryotes. Cell. 1997;89:831–838. - PubMed
    1. Rakotondrafara A, Polacek C, Harris E, Miller W. Oscillating kissing stem-loop interactions mediate 5' scanning-dependent translation by a viral 3'-cap-independent translation element. RNA. 2006;12:1893–1906. - PMC - PubMed

Publication types

LinkOut - more resources