Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Mar 14;18(1):170.
doi: 10.1186/s12859-017-1584-1.

MicroRNA categorization using sequence motifs and k-mers

Affiliations

MicroRNA categorization using sequence motifs and k-mers

Malik Yousef et al. BMC Bioinformatics. .

Abstract

Background: Post-transcriptional gene dysregulation can be a hallmark of diseases like cancer and microRNAs (miRNAs) play a key role in the modulation of translation efficiency. Known pre-miRNAs are listed in miRBase, and they have been discovered in a variety of organisms ranging from viruses and microbes to eukaryotic organisms. The computational detection of pre-miRNAs is of great interest, and such approaches usually employ machine learning to discriminate between miRNAs and other sequences. Many features have been proposed describing pre-miRNAs, and we have previously introduced the use of sequence motifs and k-mers as useful ones. There have been reports of xeno-miRNAs detected via next generation sequencing. However, they may be contaminations and to aid that important decision-making process, we aimed to establish a means to differentiate pre-miRNAs from different species.

Results: To achieve distinction into species, we used one species' pre-miRNAs as the positive and another species' pre-miRNAs as the negative training and test data for the establishment of machine learned models based on sequence motifs and k-mers as features. This approach resulted in higher accuracy values between distantly related species while species with closer relation produced lower accuracy values.

Conclusions: We were able to differentiate among species with increasing success when the evolutionary distance increases. This conclusion is supported by previous reports of fast evolutionary changes in miRNAs since even in relatively closely related species a fairly good discrimination was possible.

Keywords: Differentiate miRNAs among species; Machine learning; Pre-microRNA; Sequence motifs; k-mer; miRNA categorization; microRNA.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Workflow for model establishment. Data is transformed into a feature vector, and the best 100 features are selected. Initially, 10% data is withheld from the 100-fold MCCV training and testing scheme. All performance measures for testing and holdout data are collected during CV and reported at the end of the workflow
Fig. 2
Fig. 2
Phylogenetic relationship among organisms and groups used in the present study (excluding viruses). Itol (http://itol2.embl.de/) was used to create the phylogenetic tree [42]. Newick and PhyloXML formatted files to build the tree are available as Additional files 3 and 4: Files S2 and S3, respectively
Fig. 3
Fig. 3
Accuracy distribution over 100-fold MCCV for six selected species and groups of species against Hominidae
Fig. 4
Fig. 4
Model accuracy distribution for models trained with pre-created motifs and for the workflow where motifs were created in each iteration

References

    1. Erson-Bensan AE. Introduction to microRNAs in biological systems. Methods Mol Biol. 2014;1107:1–14. doi: 10.1007/978-1-62703-748-8_1. - DOI - PubMed
    1. Bartel DP. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell. 2004;116:281–297. doi: 10.1016/S0092-8674(04)00045-5. - DOI - PubMed
    1. Grey F. Role of microRNAs in herpesvirus latency and persistence. J Gen Virol. 2015;96:739–751. doi: 10.1099/vir.0.070862-0. - DOI - PubMed
    1. Yousef M, Allmer J, Khalifaa W. Plant MicroRNA Prediction employing Sequence Motifs Achieves High Accuracy. 2015.
    1. Chapman EJ, Carrington JC. Specialization and evolution of endogenous small RNA pathways. Nat. Rev. Genet. Nature Publishing Group; 2007;8:884–96. - PubMed

LinkOut - more resources