. 2021 Nov 13;22(22):12291.

doi: 10.3390/ijms222212291.

A Deep Learning Approach with Data Augmentation to Predict Novel Spider Neurotoxic Peptides

Byungjo Lee¹, Min Kyoung Shin¹, In-Wook Hwang¹, Junghyun Jung¹, Yu Jeong Shim¹, Go Woon Kim¹, Seung Tae Kim², Wonhee Jang¹, Jung-Suk Sung¹

Affiliations

¹ Department of Life Science, Biomedi Campus, Donnguk University-Seoul, 32, Dongguk-ro, Ilsandong-gu, Goyang-si 10326, Korea.
² Life and Environment Research Institute, Konkuk University, 120, Neungdong-ro, Gwangjin-gu, Seoul 05029, Korea.

PMID: 34830173
PMCID: PMC8619404
DOI: 10.3390/ijms222212291

A Deep Learning Approach with Data Augmentation to Predict Novel Spider Neurotoxic Peptides

Byungjo Lee et al. Int J Mol Sci. 2021.

. 2021 Nov 13;22(22):12291.

doi: 10.3390/ijms222212291.

Authors

Byungjo Lee¹, Min Kyoung Shin¹, In-Wook Hwang¹, Junghyun Jung¹, Yu Jeong Shim¹, Go Woon Kim¹, Seung Tae Kim², Wonhee Jang¹, Jung-Suk Sung¹

Affiliations

¹ Department of Life Science, Biomedi Campus, Donnguk University-Seoul, 32, Dongguk-ro, Ilsandong-gu, Goyang-si 10326, Korea.
² Life and Environment Research Institute, Konkuk University, 120, Neungdong-ro, Gwangjin-gu, Seoul 05029, Korea.

PMID: 34830173
PMCID: PMC8619404
DOI: 10.3390/ijms222212291

Abstract

As major components of spider venoms, neurotoxic peptides exhibit structural diversity, target specificity, and have great pharmaceutical potential. Deep learning may be an alternative to the laborious and time-consuming methods for identifying these peptides. However, the major hurdle in developing a deep learning model is the limited data on neurotoxic peptides. Here, we present a peptide data augmentation method that improves the recognition of neurotoxic peptides via a convolutional neural network model. The neurotoxic peptides were augmented with the known neurotoxic peptides from UniProt database, and the models were trained using a training set with or without the generated sequences to verify the augmented data. The model trained with the augmented dataset outperformed the one with the unaugmented dataset, achieving accuracy of 0.9953, precision of 0.9922, recall of 0.9984, and F1 score of 0.9953 in simulation dataset. From the set of all RNA transcripts of Callobius koreanus spider, we discovered neurotoxic peptides via the model, resulting in 275 putative peptides of which 252 novel sequences and only 23 sequences showing homology with the known peptides by Basic Local Alignment Search Tool. Among these 275 peptides, four were selected and shown to have neuromodulatory effects on the human neuroblastoma cell line SH-SY5Y. The augmentation method presented here may be applied to the identification of other functional peptides from biological resources with insufficient data.

Keywords: convolutional neural network; data augmentation; deep learning; neurotoxic peptide prediction; spider transcriptome.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Figure 1**
Data preparation overview. (A) Neurotoxic peptide data were augmented by using known peptides. The sequences were generated by random substitution and insertion of amino acids, and peptides under the E-value of 1 × 10⁻⁵ were selected by BLAST. (B) An example of a known neurotoxic peptide (P83561) and the derived AUG peptides. (C) Four types of datasets, two for training, and two for performance testing were prepared to evaluate the validity of the AUG neurotoxic peptides. (D) unAUG training and test data included neurotoxic and non-neurotoxic peptides from the UniProt. The data were split into 5-fold, of which one fold was selected for the test, and the others for model training.

**Figure 2**
Prediction results of the CNN models using test and simulation datasets. (A) CNN models were trained by AUG and unAUG training datasets. Trained model performances were evaluated based on 5-fold of test datasets and a simulation dataset. (B) The performance results of test dataset prediction are represented in boxplots. The prediction performances of unAUG and AUG CNN models were compared by four performance metrics of accuracy, precision, recall, and F1 scores (** *p <* 0.01, *** *p <* 0.001, **** p ≤ 0.0001). (C) Boxplots showing simulation dataset prediction performance results. The prediction performance of the models was compared with the above four performance metrics.

**Figure 3**
Transcriptome analysis of *C. koreanus* and comparison of estimation results from the CNN models and the BLAST. (A) The annotation results of COG from the body (**left**) and the venom gland (**right**) were shown in pie charts. (B) Estimated neurotoxic peptides by the unAUG model (**left**) and the AUG model (**right**) were presented along with the BLAST results. (C) The number of the putative neurotoxic peptides predicted from the BLAST search was larger in the AUG model than in the unAUG model.

**Figure 4**
Modulatory effects of predicted peptides on the ion channel activity. Each of the four peptides from the AUG model prediction showed either activation or inhibition on specific ion channel subtype. Peptides were treated with the final concentration of 10 μM. (A) Peptide c43972 had an inhibitory effect on Ca_v when compared with the L-/N-type calcium channel inhibitor cilnidipine. (B) Peptide c62771 reduced the activity of Na_v1.7 channels, whereas c136163, c43972, and c68875 activated the channel. (C) The nAchR were activated when treated with c43972 and c68875.

See this image and copyright information in PMC

References

1. Kuhn-Nentwig L., Stocklin R., Nentwig W. Venom Composition and Strategies in Spiders: Is Everything Possible? Adv. Insect Physiol. 2011;40:1–86. doi: 10.1016/B978-0-12-387668-3.00001-5. - DOI
1. Foelix R., Erb B. Mesothelae have venom glands. J. Arachnol. 2010;38:596–598. doi: 10.1636/B10-30.1. - DOI
1. Adams M.E., Olivera B.M. Neurotoxins—Overview of an Emerging Research Technology. Trends Neurosci. 1994;17:151–155. doi: 10.1016/0166-2236(94)90092-2. - DOI - PubMed
1. Neale E.A., Bowers L.M., Jia M., Bateman K.E., Williamson L.C. Botulinum neurotoxin a blocks synaptic vesicle exocytosis but not endocytosis at the nerve terminal. J. Cell Biol. 1999;147:1249–1260. doi: 10.1083/jcb.147.6.1249. - DOI - PMC - PubMed
1. Stevens M., Peigneur S., Tytgat J. Neurotoxins and their binding areas on voltage-gated sodium channels. Front. Pharmacol. 2011;2:71. doi: 10.3389/fphar.2011.00071. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

Grants and funding

NIBR202134205/National Institute of Biological Resources

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A Deep Learning Approach with Data Augmentation to Predict Novel Spider Neurotoxic Peptides

Affiliations

A Deep Learning Approach with Data Augmentation to Predict Novel Spider Neurotoxic Peptides

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials