Prediction of polyadenylation signals in human DNA sequences using nucleotide frequencies
- PMID: 19795571
Prediction of polyadenylation signals in human DNA sequences using nucleotide frequencies
Abstract
The polyadenylation signal plays a key role in determining the site for addition of a polyadenylated tail to nascent mRNA and its mutation(s) are reported in many diseases. Thus, identifying poly(A) sites is important for understanding the regulation and stability of mRNA. In this study, Support Vector Machine (SVM) models have been developed for predicting poly(A) signals in a DNA sequence using 100 nucleotides, each upstream and downstream of this signal. Here, we introduced a novel split nucleotide frequency technique, and the models thus developed achieved maximum Matthews correlation coefficients (MCC) of 0.58, 0.69, 0.70 and 0.69 using mononucleotide, dinucleotide, trinucleotide, and tetranucleotide frequencies, respectively. Finally, a hybrid model developed using a combination of dinucleotide, 2nd order dinucleotide and tetranucleotide frequencies, achieved a maximum MCC of 0.72. Moreover, for independent datasets this model achieved a precision ranging from 75.8-95.7% with a sensitivity of 57%, which is better than any other known methods.
Similar articles
-
Prediction of mRNA polyadenylation sites by support vector machine.Bioinformatics. 2006 Oct 1;22(19):2320-5. doi: 10.1093/bioinformatics/btl394. Epub 2006 Jul 26. Bioinformatics. 2006. PMID: 16870936
-
An in-silico method for prediction of polyadenylation signals in human sequences.Genome Inform. 2003;14:84-93. Genome Inform. 2003. PMID: 15706523
-
Prediction of RNA binding sites in a protein using SVM and PSSM profile.Proteins. 2008 Apr;71(1):189-94. doi: 10.1002/prot.21677. Proteins. 2008. PMID: 17932917
-
Predicting methylation status of human DNA sequences by pseudo-trinucleotide composition.Talanta. 2011 Aug 15;85(2):1143-7. doi: 10.1016/j.talanta.2011.05.043. Epub 2011 May 27. Talanta. 2011. PMID: 21726750
-
Identification and characterization of polyadenylation signal (PAS) variants in human genomic sequences based on modified EST clustering.In Silico Biol. 2008;8(3-4):347-61. In Silico Biol. 2008. PMID: 19032167
Cited by
-
FLOating-Window Projective Separator (FloWPS): A Data Trimming Tool for Support Vector Machines (SVM) to Improve Robustness of the Classifier.Front Genet. 2019 Jan 15;9:717. doi: 10.3389/fgene.2018.00717. eCollection 2018. Front Genet. 2019. PMID: 30697229 Free PMC article.
-
Investigating the pathogenic SNPs in BLM helicase and their biological consequences by computational approach.Sci Rep. 2020 Jul 23;10(1):12377. doi: 10.1038/s41598-020-69033-8. Sci Rep. 2020. PMID: 32704157 Free PMC article.
-
Integrated Network Analysis Reveals FOXM1 and MYBL2 as Key Regulators of Cell Proliferation in Non-small Cell Lung Cancer.Front Oncol. 2019 Oct 15;9:1011. doi: 10.3389/fonc.2019.01011. eCollection 2019. Front Oncol. 2019. PMID: 31681566 Free PMC article.
-
PHDcleav: a SVM based method for predicting human Dicer cleavage sites using sequence and secondary structure of miRNA precursors.BMC Bioinformatics. 2013;14 Suppl 14(Suppl 14):S9. doi: 10.1186/1471-2105-14-S14-S9. Epub 2013 Oct 9. BMC Bioinformatics. 2013. PMID: 24267009 Free PMC article.
-
A Systems Biology and LASSO-Based Approach to Decipher the Transcriptome-Interactome Signature for Predicting Non-Small Cell Lung Cancer.Biology (Basel). 2022 Nov 30;11(12):1752. doi: 10.3390/biology11121752. Biology (Basel). 2022. PMID: 36552262 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources