Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Dec 12;9 Suppl 12(Suppl 12):S2.
doi: 10.1186/1471-2105-9-S12-S2.

Using a kernel density estimation based classifier to predict species-specific microRNA precursors

Affiliations

Using a kernel density estimation based classifier to predict species-specific microRNA precursors

Darby Tien-Hao Chang et al. BMC Bioinformatics. .

Abstract

Background: MicroRNAs (miRNAs) are short non-coding RNA molecules participating in post-transcriptional regulation of gene expression. There have been many efforts to discover miRNA precursors (pre-miRNAs) over the years. Recently, ab initio approaches obtain more attention because that they can discover species-specific pre-miRNAs. Most ab initio approaches proposed novel features to characterize RNA molecules. However, there were fewer discussions on the associated classification mechanism in a miRNA predictor.

Results: This study focuses on the classification algorithm for miRNA prediction. We develop a novel ab initio method, miR-KDE, in which most of the features are collected from previous works. The classification mechanism in miR-KDE is the relaxed variable kernel density estimator (RVKDE) that we have recently proposed. When compared to the famous support vector machine (SVM), RVKDE exploits more local information of the training dataset. MiR-KDE is evaluated using a training set consisted of only human pre-miRNAs to predict a benchmark collected from 40 species. The experimental results show that miR-KDE delivers favorable performance in predicting human pre-miRNAs and has advantages for pre-miRNAs from the genera taxonomically distant to humans.

Conclusion: We use a novel classifier of which the characteristic of exploiting local information is particularly suitable to predict species-specific pre-miRNAs. This study also provides a comprehensive analysis from the view of classification mechanism. The good performance of miR-KDE encourages more efforts on the classification methodology as well as the feature extraction in miRNA prediction.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The decision boundary plots, where (a) and (c) are generated by SVM and (b) and (d) are generated by RVKDE. The x-axis is frequency of the dinucleotide "UU", and the y-axis is base pairing propensity[44]. The black circle is a testing pre-miRNA for the pre-miRNA Caenorhabditis elegans miR-260. The red and blue circles represent positive and negative training samples. In (c) and (d), training samples not involved in the decision function of the classifiers are removed.
Figure 2
Figure 2
The decision boundary plots, where (a) and (c) are generated by SVM and (b) and (d) are generated by RVKDE. The x-axis is frequency of the dinucleotide "CC", and the y-axis is frequency of the dinucleotide "GG". The black circle is a testing pseudo hairpin. The red and blue circles represent positive and negative training samples. In (c) and (d), training samples not involved in the decision function of the classifiers are removed.
Figure 3
Figure 3
The decision boundary plots, where (a) and (c) are generated by SVM and (b) and (d) are generated by RVKDE. The x-axis is frequency of the dinucleotide "CG", and the y-axis is ratio of the minimum free energy to the sequence length[46]. The black circle is a testing pre-miRNA for the pre-miRNA Zea mays miR168a. The red and blue circles represent positive and negative training samples. In (c) and (d), training samples not involved in the decision function of the classifiers are removed.
Figure 4
Figure 4
The decision boundary plots, where (a) and (c) are generated by SVM and (b) and (d) are generated by RVKDE. The x-axis is frequency of the dinucleotide "GG", and the y-axis is ratio of the minimum free energy to the sequence length[46]. The black circle is a testing pseudo hairpin. The red and blue circles represent positive and negative training samples. In (c) and (d), training samples not involved in the decision function of the classifiers are removed.
Figure 5
Figure 5
The Homo sapiens miR-611 stem-loop structure. The RNA sequence and its corresponding secondary structure sequence predicted by RNAfold [43] are shown. In the secondary structure sequence, each nucleotide has two states, "paired" or "unpaired", indicated by brackets and dots, respectively. A left bracket "(" indicates a paired nucleotide located at the 5' strand that would form a pair with another nucleotide at the 3' strand with a right bracket ")". The hairpin length of this sample pre-miRNA is 25+8+25 = 58. Its loop length is 8 and has 8 consecutive base pairs.

Similar articles

Cited by

References

    1. Bartel DP. MicroRNAs: Genomics, biogenesis, mechanism, and function. Cell. 2004;116:281–297. doi: 10.1016/S0092-8674(04)00045-5. - DOI - PubMed
    1. Ambros V. The functions of animal microRNAs. Nature. 2004;431:350–355. doi: 10.1038/nature02871. - DOI - PubMed
    1. He L, Hannon GJ. MicroRNAs: Small RNAs with a big role in gene regulation. Nat Rev Genet. 2004;5:522–531. doi: 10.1038/nrg1379. - DOI - PubMed
    1. Lee RC, Feinbaum RL, Ambros V. The C-Elegans Heterochronic Gene Lin-4 Encodes Small Rnas with Antisense Complementarity to Lin-14. Cell. 1993;75:843–854. doi: 10.1016/0092-8674(93)90529-Y. - DOI - PubMed
    1. Reinhart BJ, Slack FJ, Basson M, Pasquinelli AE, Bettinger JC, Rougvie AE, Horvitz HR, Ruvkun G. The 21-nucleotide let-7 RNA regulates developmental timing in Caenorhabditis elegans. Nature. 2000;403:901–906. doi: 10.1038/35002607. - DOI - PubMed

Publication types

LinkOut - more resources