Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Nov 13;22(22):12291.
doi: 10.3390/ijms222212291.

A Deep Learning Approach with Data Augmentation to Predict Novel Spider Neurotoxic Peptides

Affiliations

A Deep Learning Approach with Data Augmentation to Predict Novel Spider Neurotoxic Peptides

Byungjo Lee et al. Int J Mol Sci. .

Abstract

As major components of spider venoms, neurotoxic peptides exhibit structural diversity, target specificity, and have great pharmaceutical potential. Deep learning may be an alternative to the laborious and time-consuming methods for identifying these peptides. However, the major hurdle in developing a deep learning model is the limited data on neurotoxic peptides. Here, we present a peptide data augmentation method that improves the recognition of neurotoxic peptides via a convolutional neural network model. The neurotoxic peptides were augmented with the known neurotoxic peptides from UniProt database, and the models were trained using a training set with or without the generated sequences to verify the augmented data. The model trained with the augmented dataset outperformed the one with the unaugmented dataset, achieving accuracy of 0.9953, precision of 0.9922, recall of 0.9984, and F1 score of 0.9953 in simulation dataset. From the set of all RNA transcripts of Callobius koreanus spider, we discovered neurotoxic peptides via the model, resulting in 275 putative peptides of which 252 novel sequences and only 23 sequences showing homology with the known peptides by Basic Local Alignment Search Tool. Among these 275 peptides, four were selected and shown to have neuromodulatory effects on the human neuroblastoma cell line SH-SY5Y. The augmentation method presented here may be applied to the identification of other functional peptides from biological resources with insufficient data.

Keywords: convolutional neural network; data augmentation; deep learning; neurotoxic peptide prediction; spider transcriptome.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Data preparation overview. (A) Neurotoxic peptide data were augmented by using known peptides. The sequences were generated by random substitution and insertion of amino acids, and peptides under the E-value of 1 × 10−5 were selected by BLAST. (B) An example of a known neurotoxic peptide (P83561) and the derived AUG peptides. (C) Four types of datasets, two for training, and two for performance testing were prepared to evaluate the validity of the AUG neurotoxic peptides. (D) unAUG training and test data included neurotoxic and non-neurotoxic peptides from the UniProt. The data were split into 5-fold, of which one fold was selected for the test, and the others for model training.
Figure 2
Figure 2
Prediction results of the CNN models using test and simulation datasets. (A) CNN models were trained by AUG and unAUG training datasets. Trained model performances were evaluated based on 5-fold of test datasets and a simulation dataset. (B) The performance results of test dataset prediction are represented in boxplots. The prediction performances of unAUG and AUG CNN models were compared by four performance metrics of accuracy, precision, recall, and F1 scores (** p < 0.01, *** p < 0.001, **** p ≤ 0.0001). (C) Boxplots showing simulation dataset prediction performance results. The prediction performance of the models was compared with the above four performance metrics.
Figure 3
Figure 3
Transcriptome analysis of C. koreanus and comparison of estimation results from the CNN models and the BLAST. (A) The annotation results of COG from the body (left) and the venom gland (right) were shown in pie charts. (B) Estimated neurotoxic peptides by the unAUG model (left) and the AUG model (right) were presented along with the BLAST results. (C) The number of the putative neurotoxic peptides predicted from the BLAST search was larger in the AUG model than in the unAUG model.
Figure 4
Figure 4
Modulatory effects of predicted peptides on the ion channel activity. Each of the four peptides from the AUG model prediction showed either activation or inhibition on specific ion channel subtype. Peptides were treated with the final concentration of 10 μM. (A) Peptide c43972 had an inhibitory effect on Cav when compared with the L-/N-type calcium channel inhibitor cilnidipine. (B) Peptide c62771 reduced the activity of Nav1.7 channels, whereas c136163, c43972, and c68875 activated the channel. (C) The nAchR were activated when treated with c43972 and c68875.

References

    1. Kuhn-Nentwig L., Stocklin R., Nentwig W. Venom Composition and Strategies in Spiders: Is Everything Possible? Adv. Insect Physiol. 2011;40:1–86. doi: 10.1016/B978-0-12-387668-3.00001-5. - DOI
    1. Foelix R., Erb B. Mesothelae have venom glands. J. Arachnol. 2010;38:596–598. doi: 10.1636/B10-30.1. - DOI
    1. Adams M.E., Olivera B.M. Neurotoxins—Overview of an Emerging Research Technology. Trends Neurosci. 1994;17:151–155. doi: 10.1016/0166-2236(94)90092-2. - DOI - PubMed
    1. Neale E.A., Bowers L.M., Jia M., Bateman K.E., Williamson L.C. Botulinum neurotoxin a blocks synaptic vesicle exocytosis but not endocytosis at the nerve terminal. J. Cell Biol. 1999;147:1249–1260. doi: 10.1083/jcb.147.6.1249. - DOI - PMC - PubMed
    1. Stevens M., Peigneur S., Tytgat J. Neurotoxins and their binding areas on voltage-gated sodium channels. Front. Pharmacol. 2011;2:71. doi: 10.3389/fphar.2011.00071. - DOI - PMC - PubMed