Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jul 14:8:808.
doi: 10.3389/fbioe.2020.00808. eCollection 2020.

Riboflow: Using Deep Learning to Classify Riboswitches With ∼99% Accuracy

Affiliations

Riboflow: Using Deep Learning to Classify Riboswitches With ∼99% Accuracy

Keshav Aditya R Premkumar et al. Front Bioeng Biotechnol. .

Abstract

Riboswitches are cis-regulatory genetic elements that use an aptamer to control gene expression. Specificity to cognate ligand and diversity of such ligands have expanded the functional repertoire of riboswitches to mediate mounting apt responses to sudden metabolic demands and signal changes in environmental conditions. Given their critical role in microbial life, riboswitch characterisation remains a challenging computational problem. Here we have addressed the issue with advanced deep learning frameworks, namely convolutional neural networks (CNN), and bidirectional recurrent neural networks (RNN) with Long Short-Term Memory (LSTM). Using a comprehensive dataset of 32 ligand classes and a stratified train-validate-test approach, we demonstrated the accurate performance of both the deep learning models (CNN and RNN) relative to conventional hyperparameter-optimized machine learning classifiers on all key performance metrics, including the ROC curve analysis. In particular, the bidirectional LSTM RNN emerged as the best-performing learning method for identifying the ligand-specificity of riboswitches with an accuracy >0.99 and macro-averaged F-score of 0.96. An additional attraction is that the deep learning models do not require prior feature engineering. A dynamic update functionality is built into the models to factor for the constant discovery of new riboswitches, and extend the predictive modeling to new classes. Our work would enable the design of genetic circuits with custom-tuned riboswitch aptamers that would effect precise translational control in synthetic biology. The associated software is available as an open-source Python package and standalone resource for use in genome annotation, synthetic biology, and biotechnology workflows.

Keywords: clustering; convolutional neural network; hyperparameter optimization; machine learning; multiclass ROC; recurrent neural network; riboswitch family; synthetic biology.

PubMed Disclaimer

Figures

FIGURE 1
FIGURE 1
Deep learning frameworks used in the study. (A), CNN architecture, optimized for two 1-dimensional convolutional layers; and (B), Bidirectional RNN with LSTM, optimized for two bidirectional layers. Two dropout layers are used in the RNN.
FIGURE 2
FIGURE 2
Epoch tuning curves for the CNN (A) and RNN (B). The CNN converges faster with respect to the number of epochs, however, the RNN learns better, as seen with the continuously decreasing loss function.
FIGURE 3
FIGURE 3
Standard performance metrics. Clockwise from top left, Accuracy; Precision; F-score; and Recall. The overall precision, recall and F-score were computed by macro-averaging the classwise scores. The deep models emerged as vastly superior alternatives to the base machine learning models on all performance metrics.
FIGURE 4
FIGURE 4
AUROC for the base models. (A) Decision Tree, (B) Gaussian NB, (C) kNN, D: AdaBoost, (E) Random Forest, and (F) Multi-layer Perceptron. Gray lines denote classwise AUROCs of all 32 classes, from which it is clear that not all classes are equally learnt.
FIGURE 5
FIGURE 5
AUROC for the deep models. (A) CNN, and (B) RNN. Gray lines denote classwise AUROCs of all 32 classes. It clear that RNN achieves learning perfection at both the macro and classwise levels.

Similar articles

Cited by

References

    1. Abadi M., Agarwal A., Barham P., Brevdo E., Chen Z., Citro C., et al. (2015). TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Available online at: www.tensorflow.org (accessed May 10, 2018).
    1. Abreu-Goodger C., Merino E. (2005). RibEx: a web server for locating riboswitches and other conserved bacterial regulatory elements. Nucleic Acids Res. 33(Suppl. 2), W690–W692. - PMC - PubMed
    1. Alipanahi B., Delong A., Weirauch M. T., Frey B. J. (2015). Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33 831–838. 10.1038/nbt.3300 - DOI - PubMed
    1. Antunes D., Jorge N., Caffarena E. R., Passetti F. (2017). Using RNA sequence and structure for the prediction of riboswitch aptamer: a comprehensive review of available software and tools. Front. Genet. 8:231. 10.3389/fgene.2017.00231 - DOI - PMC - PubMed
    1. Barrick J. E., Breaker R. R. (2007). The distributions, mechanisms, and structures of metabolite-binding riboswitches. Genome Biol. 8:R239. - PMC - PubMed

LinkOut - more resources