. 2019 Apr 8;47(6):e36.

doi: 10.1093/nar/gkz061.

DeepRibo: a neural network for precise gene annotation of prokaryotes by combining ribosome profiling signal and binding site patterns

Jim Clauwaert¹, Gerben Menschaert², Willem Waegeman¹

Affiliations

¹ KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, Coupure Links 653, 9000 Gent, Belgium.
² Biobix, Department of Data Analysis and Mathematical Modelling, Ghent University, Coupure Links 653, 9000 Gent, Belgium.

PMID: 30753697
PMCID: PMC6451124
DOI: 10.1093/nar/gkz061

DeepRibo: a neural network for precise gene annotation of prokaryotes by combining ribosome profiling signal and binding site patterns

Jim Clauwaert et al. Nucleic Acids Res. 2019.

. 2019 Apr 8;47(6):e36.

doi: 10.1093/nar/gkz061.

Authors

Jim Clauwaert¹, Gerben Menschaert², Willem Waegeman¹

Affiliations

¹ KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, Coupure Links 653, 9000 Gent, Belgium.
² Biobix, Department of Data Analysis and Mathematical Modelling, Ghent University, Coupure Links 653, 9000 Gent, Belgium.

PMID: 30753697
PMCID: PMC6451124
DOI: 10.1093/nar/gkz061

Abstract

Annotation of gene expression in prokaryotes often finds itself corrected due to small variations of the annotated gene regions observed between different (sub)-species. It has become apparent that traditional sequence alignment algorithms, used for the curation of genomes, are not able to map the full complexity of the genomic landscape. We present DeepRibo, a novel neural network utilizing features extracted from ribosome profiling information and binding site sequence patterns that shows to be a precise tool for the delineation and annotation of expressed genes in prokaryotes. The neural network combines recurrent memory cells and convolutional layers, adapting the information gained from both the high-throughput ribosome profiling data and ribosome binding translation initiation sequence region into one model. DeepRibo is designed as a single model trained on a variety of ribosome profiling experiments, used for the identification of open reading frames in prokaryotes without a priori knowledge of the translational landscape. Through extensive validation of the model trained on various sets of data, multiple species sequence similarity, mass spectrometry and Edman degradation verified proteins, the effectiveness of DeepRibo is highlighted.

PubMed Disclaimer

Figures

**Figure 1.**
The architecture of the neural network DeepRibo. For each candidate ORF two types of data are processed and fed into their respective parts of the neural network. The convolutional layers train on a 30 nucleotide DNA sequence ranging from 20 nucleotides upstream to 10 nucleotides downstream of the TIS. The recurrent neural network covers the complete ORF from 50 nucleotides upstream of the start codon, including the SD region, and extending 20 nucleotides downstream of the stop codon. The DNA sequence is first translated in a binary image before being processed by four 1 × 1 and 32 1 × 12 convolutional kernels, respectively. The ribosome profiling data is processed by a double layered bidirectional GRU of 128 hidden nodes. The outputs of both neural networks are flattened and concatenated and fed into three consecutive fully-connected layers of length 1024, 512 and 2.

**Figure 2.**
Bend point estimation on the fitted S-curves of the coverage in function of the log RPKM for both the *E. coli* (left) and *S. aureus* (right) dataset. The positive samples for each dataset (red) are plotted with the predicted (blue) ones for the fitted S-curve. For each dataset, the lower bend point of the fitted curve is estimated using the bent-cable function to obtain the minimum cut-off values.

**Figure 3.**
The precision-recall curves of the different networks on the *E. coli* dataset. the precision-recall curves are given in case of the multiple start site and the single start site set-up. The full model (full line), combining the RNN and CNN outperforms both the single CNN (dashed) and RNN (dotted) architecture.

**Figure 4.**
Venn diagram displaying the distributions of the proteins verified by Edman sequencing (left) and mass spectrometry (right) within the annotations provided by DeepRibo and the NCBI RefSeq database (labels). Distributions only include expressed ORFs, determined using the S-curve methodology.

**Figure 5.**
E value distributions for the pBLAST results on newly predicted proteins (left) and proteoforms (right) for the different datasets. The E values are given for the best hit (if existent) for each of the false positives. The dashed line indicates the E value of 1.

**Figure 6.**
DeepRibo example annotations displayed alongside the ribo-seq input signal and RefSeq annotations. The data is formatted using the GWIPS-viz browser (43) and is hosted publicly (see Supplementary Data). On every track is displayed (from top to bottom): ribo-seq signal (sense: orange, antisense: blue), TISs of all ORF samples present in the test set, annotations predicted by DeepRibo not in agreement with the RefSeq assembly (Predicted ORF) and the RefSeq genome annotations used to label the data (Labeled ORF). (A) The highest ranking proteoform prediction (gene: PqqL, rank: 231) for *E. coli*. (B) The highest ranking proteoform prediction (gene: UbiE, rank: 131) for *S. aureus*. (C) The highest ranking novel protein for *E. coli* with no pBLAST alignments (rank: 1302). (D) An example of a predicted proteoform in a region with overlapping genes (gene: ybhF, rank: 941).

See this image and copyright information in PMC

Cited by

Small proteins in Gram-positive bacteria.
Brantl S, Ul Haq I. Brantl S, et al. FEMS Microbiol Rev. 2023 Nov 1;47(6):fuad064. doi: 10.1093/femsre/fuad064. FEMS Microbiol Rev. 2023. PMID: 38052429 Free PMC article.
A Comprehensive Review of Bioinformatics Tools for Genomic Biomarker Discovery Driving Precision Oncology.
Clark AJ, Lillard JW Jr. Clark AJ, et al. Genes (Basel). 2024 Aug 6;15(8):1036. doi: 10.3390/genes15081036. Genes (Basel). 2024. PMID: 39202397 Free PMC article. Review.
Modelling microbial communities: Harnessing consortia for biotechnological applications.
Ibrahim M, Raajaraam L, Raman K. Ibrahim M, et al. Comput Struct Biotechnol J. 2021 Jul 3;19:3892-3907. doi: 10.1016/j.csbj.2021.06.048. eCollection 2021. Comput Struct Biotechnol J. 2021. PMID: 34584635 Free PMC article. Review.
Lost and Found: Re-searching and Re-scoring Proteomics Data Aids Genome Annotation and Improves Proteome Coverage.
Willems P, Fijalkowski I, Van Damme P. Willems P, et al. mSystems. 2020 Oct 27;5(5):e00833-20. doi: 10.1128/mSystems.00833-20. mSystems. 2020. PMID: 33109751 Free PMC article.
Trips-Viz: an environment for the analysis of public and user-generated ribosome profiling data.
Kiniry SJ, Judge CE, Michel AM, Baranov PV. Kiniry SJ, et al. Nucleic Acids Res. 2021 Jul 2;49(W1):W662-W670. doi: 10.1093/nar/gkab323. Nucleic Acids Res. 2021. PMID: 33950201 Free PMC article.

See all "Cited by" articles

References

1. Land M., Hauser L., Jun S.-R., Nookaew I., Leuze M.R., Ahn T.-H., Karpinets T., Lund O., Kora G., Wassenaar T. et al. .. Insights from 20 years of bacterial genome sequencing. Funct. Integrative Genomics. 2015; 15:141–161. - PMC - PubMed
1. Richardson E.J., Watson M.. The automatic annotation of bacterial genomes. Brief. Bioinformatics. 2013; 14:1–12. - PMC - PubMed
1. Fields A.P., Rodriguez E.H., Jovanovic M., Stern-Ginossar N., Haas B.J., Mertins P., Raychowdhury R., Hacohen N., Carr S.A., Ingolia N.T. et al. .. A regression-based analysis of ribosome-profiling data reveals a conserved complexity to mammalian translation. Mol. Cell. 2015; 60:816–827. - PMC - PubMed
1. Delcher A. Improved microbial gene identification with GLIMMER. Nucleic Acids Res. 1999; 27:4636–4641. - PMC - PubMed
1. Hyatt D., Chen G.L., LoCascio P.F., Land M.L., Larimer F.W., Hauser L.J.. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010; 11:119. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

DeepRibo: a neural network for precise gene annotation of prokaryotes by combining ribosome profiling signal and binding site patterns

Affiliations

DeepRibo: a neural network for precise gene annotation of prokaryotes by combining ribosome profiling signal and binding site patterns

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources