A Tool Preference Choice Method for RNA Secondary Structure Prediction by SVM with Statistical Tests
- PMID: 23641141
- PMCID: PMC3629938
- DOI: 10.4137/EBO.S10580
A Tool Preference Choice Method for RNA Secondary Structure Prediction by SVM with Statistical Tests
Abstract
The Prediction of RNA secondary structures has drawn much attention from both biologists and computer scientists. Many useful tools have been developed for this purpose. These tools have their individual strengths and weaknesses. As a result, based on support vector machines (SVM), we propose a tool choice method which integrates three prediction tools: pknotsRG, RNAStructure, and NUPACK. Our method first extracts features from the target RNA sequence, and adopts two information-theoretic feature selection methods for feature ranking. We propose a method to combine feature selection and classifier fusion in an incremental manner. Our test data set contains 720 RNA sequences, where 225 pseudoknotted RNA sequences are obtained from PseudoBase, and 495 nested RNA sequences are obtained from RNA SSTRAND. The method serves as a preprocessing way in analyzing RNA sequences before the RNA secondary structure prediction tools are employed. In addition, the performance of various configurations is subject to statistical tests to examine their significance. The best base-pair accuracy achieved is 75.5%, which is obtained by the proposed incremental method, and is significantly higher than 68.8%, which is associated with the best predictor, pknotsRG.
Keywords: RNA; feature selection; secondary structure; statistical test; support vector machine.
Figures
Similar articles
-
Computer-assisted lip diagnosis on Traditional Chinese Medicine using multi-class support vector machines.BMC Complement Altern Med. 2012 Aug 16;12:127. doi: 10.1186/1472-6882-12-127. BMC Complement Altern Med. 2012. PMID: 22898352 Free PMC article.
-
A permutation based simulated annealing algorithm to predict pseudoknotted RNA secondary structures.Int J Bioinform Res Appl. 2015;11(5):375-96. doi: 10.1504/ijbra.2015.071938. Int J Bioinform Res Appl. 2015. PMID: 26558299
-
A comparative study on feature selection for a risk prediction model for colorectal cancer.Comput Methods Programs Biomed. 2019 Aug;177:219-229. doi: 10.1016/j.cmpb.2019.06.001. Epub 2019 Jun 4. Comput Methods Programs Biomed. 2019. PMID: 31319951
-
Seminal quality prediction using data mining methods.Technol Health Care. 2014;22(4):531-45. doi: 10.3233/THC-140816. Technol Health Care. 2014. PMID: 24898862
-
Top scoring pairs for feature selection in machine learning and applications to cancer outcome prediction.BMC Bioinformatics. 2011 Sep 23;12:375. doi: 10.1186/1471-2105-12-375. BMC Bioinformatics. 2011. PMID: 21939564 Free PMC article.
Cited by
-
RNA-targeted small-molecule drug discoveries: a machine-learning perspective.RNA Biol. 2023 Jan;20(1):384-397. doi: 10.1080/15476286.2023.2223498. RNA Biol. 2023. PMID: 37337437 Free PMC article. Review.
-
Review of machine learning methods for RNA secondary structure prediction.PLoS Comput Biol. 2021 Aug 26;17(8):e1009291. doi: 10.1371/journal.pcbi.1009291. eCollection 2021 Aug. PLoS Comput Biol. 2021. PMID: 34437528 Free PMC article. Review.
References
-
- Huang CD, Lin CT, Pal NR. Hierarchical learning architecture with automatic feature selection for multiclass protein fold classification. IEEE Trans Nanobioscience. 2003;2(4):221–32. - PubMed
-
- Wang J, Zhang Y. Characterization and similarity analysis of DNA sequences based on mutually direct-complementary triplets. Chem Phys Lett. 2006;426(4–6):324–8.
-
- Hu MK. Visual pattern recognition by moment invariants. IRE Transactions on Information Theory. 1962;8(2):179–87.
-
- Percival DB, Walden AT. Wavelet Methods for Time Series Analysis (Cambridge Series in Statistical and Probabilistic Mathematics) New York: Cambridge University Press; 2000.
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials