Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jul 15;33(14):i234-i242.
doi: 10.1093/bioinformatics/btx247.

TITER: predicting translation initiation sites by deep learning

Affiliations

TITER: predicting translation initiation sites by deep learning

Sai Zhang et al. Bioinformatics. .

Abstract

Motivation: Translation initiation is a key step in the regulation of gene expression. In addition to the annotated translation initiation sites (TISs), the translation process may also start at multiple alternative TISs (including both AUG and non-AUG codons), which makes it challenging to predict TISs and study the underlying regulatory mechanisms. Meanwhile, the advent of several high-throughput sequencing techniques for profiling initiating ribosomes at single-nucleotide resolution, e.g. GTI-seq and QTI-seq, provides abundant data for systematically studying the general principles of translation initiation and the development of computational method for TIS identification.

Methods: We have developed a deep learning-based framework, named TITER, for accurately predicting TISs on a genome-wide scale based on QTI-seq data. TITER extracts the sequence features of translation initiation from the surrounding sequence contexts of TISs using a hybrid neural network and further integrates the prior preference of TIS codon composition into a unified prediction framework.

Results: Extensive tests demonstrated that TITER can greatly outperform the state-of-the-art prediction methods in identifying TISs. In addition, TITER was able to identify important sequence signatures for individual types of TIS codons, including a Kozak-sequence-like motif for AUG start codon. Furthermore, the TITER prediction score can be related to the strength of translation initiation in various biological scenarios, including the repressive effect of the upstream open reading frames on gene expression and the mutational effects influencing translation initiation efficiency.

Availability and implementation: TITER is available as an open-source software and can be downloaded from https://github.com/zhangsaithu/titer .

Contact: lzhang20@mail.tsinghua.edu.cn or zengjy321@tsinghua.edu.cn.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Schematic overview of the TITER pipeline. See the main text for more details
Fig. 2
Fig. 2
Statistics of the translation initiation sites in the Gao15 dataset. (a) Codon composition of TISs, in which only those codons with a fraction >1% are shown. (b) Fractions of different types of TISs
Fig. 3
Fig. 3
Schematic illustration of (a) the hybrid deep neural network architecture and (b) the bootstrapping-based technique used in TITER. See the main text for more details
Fig. 4
Fig. 4
Prediction performance on different test datasets. (a, b) Comparison of prediction performance between different methods on the Gao15 dataset evaluated by (a) ROC and (b) PR curves, respectively. (c, d) Comparison of prediction performance between different methods on the Calviello16 dataset evaluated by (c) ROC and (d) PR curves, respectively. ‘preTITER’ denotes a preliminary version of our deep learning framework that only considered the context features of TISs
Fig. 5
Fig. 5
The sequence motifs generated by TITER for (a) AUG (ATG), (b) CUG (CTG), (c) GUG (GTG) and (d) UUG (TTG) TIS codons, respectively. The final position weight matrix (PWM) for each TIS codon was calculated by averaging the optimal input sequences computed by the ensemble of 32 deep neural networks in TITER. The base sequence motifs were visualized using Seq2Logo v2.0 (Thomsen and Nielsen, 2012). All the sequence motifs were visualized in the cDNA setting
Fig. 6
Fig. 6
The prediction scores of TITER correlate with the experimentally measured mutational effects. (a, c) Illustrations of different mutations in the tests derived from studies (Noderer et al., 2014) and (Calvo et al., 2009), respectively, in which the base sequences were shown in the cDNA setting. The single nucleotide variants were underlined, and the wild-type ATGs and the emerging ATGs were colored in red and blue, respectively. (b, d) The correlations between the prediction scores of TITER and the experimentally measured mutational effects in the previous studies (Noderer et al., 2014) and (Calvo et al., 2009), respectively

References

    1. Aken B.L. et al. (2016) The Ensembl gene annotation system. Database, 2016. - PMC - PubMed
    1. Alipanahi B. et al. (2015) Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotech., 33, 831–838. - PubMed
    1. Barbosa C. et al. (2013) Gene expression regulation by upstream open reading frames and human disease. PLOS Genet., 9, e1003529.. - PMC - PubMed
    1. Bengio Y. (2012). Neural Networks: Tricks of the Trade In: Practical Recommendations for Gradient-Based Training of Deep Architectures, 2nd edn.Springer, Berlin, Heidelberg, pp. 437–478.
    1. Bergstra J.S. et al. (2011). Algorithms for hyper-parameter optimization In Shawe-Taylor J.et al. , eds. Advances in Neural Information Processing Systems 24. Curran Associates, Inc, pp. 2546–2554.

Substances