. 2015 Mar 30;10(3):e0121501.

doi: 10.1371/journal.pone.0121501. eCollection 2015.

Identification of real microRNA precursors with a pseudo structure status composition approach

Bin Liu¹, Longyun Fang², Fule Liu², Xiaolong Wang³, Junjie Chen², Kuo-Chen Chou⁴

Affiliations

¹ School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China; Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China; Gordon Life Science Institute, Belmont, Massachusetts, United States of America.
² School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China.
³ School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China; Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China.
⁴ Gordon Life Science Institute, Belmont, Massachusetts, United States of America; Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia.

PMID: 25821974
PMCID: PMC4378912
DOI: 10.1371/journal.pone.0121501

Identification of real microRNA precursors with a pseudo structure status composition approach

Bin Liu et al. PLoS One. 2015.

. 2015 Mar 30;10(3):e0121501.

doi: 10.1371/journal.pone.0121501. eCollection 2015.

Authors

Bin Liu¹, Longyun Fang², Fule Liu², Xiaolong Wang³, Junjie Chen², Kuo-Chen Chou⁴

Affiliations

¹ School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China; Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China; Gordon Life Science Institute, Belmont, Massachusetts, United States of America.
² School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China.
³ School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China; Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China.
⁴ Gordon Life Science Institute, Belmont, Massachusetts, United States of America; Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia.

PMID: 25821974
PMCID: PMC4378912
DOI: 10.1371/journal.pone.0121501

Abstract

Containing about 22 nucleotides, a micro RNA (abbreviated miRNA) is a small non-coding RNA molecule, functioning in transcriptional and post-transcriptional regulation of gene expression. The human genome may encode over 1000 miRNAs. Albeit poorly characterized, miRNAs are widely deemed as important regulators of biological processes. Aberrant expression of miRNAs has been observed in many cancers and other disease states, indicating they are deeply implicated with these diseases, particularly in carcinogenesis. Therefore, it is important for both basic research and miRNA-based therapy to discriminate the real pre-miRNAs from the false ones (such as hairpin sequences with similar stem-loops). Particularly, with the avalanche of RNA sequences generated in the postgenomic age, it is highly desired to develop computational sequence-based methods in this regard. Here two new predictors, called "iMcRNA-PseSSC" and "iMcRNA-ExPseSSC", were proposed for identifying the human pre-microRNAs by incorporating the global or long-range structure-order information using a way quite similar to the pseudo amino acid composition approach. Rigorous cross-validations on a much larger and more stringent newly constructed benchmark dataset showed that the two new predictors (accessible at http://bioinformatics.hitsz.edu.cn/iMcRNA/) outperformed or were highly comparable with the best existing predictors in this area.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

**Fig 1. An illustration to show biogenesis of miRNAs and model of miRNA-mediated translational repression or mRNA degradation.**
MiRNA genes are transcribed by RNA polymerase II [2,90], resulting in the primary transcripts termed as pri-miRNAs, which are typically 60–70 nucleotides. The pri-miRNAs are processed by the enzyme Drosha to release the hairpin-shaped intermediates (pre-miRNAs) [3], followed by being exported into the cytoplasm by Exportin V and Ran-GTP cofactor [–6], and then cleaved by the enzyme Dicer to yield miRNA/miRNA* duplexes [–11].

**Fig 2. Illustration to show the 6 structure statuses of paired nucleic acid residues.**
Note that the nucleotide near 5’ end is different with the one near 3’end: (a) the base pairs A-U or U-A has 2 hydrogen bonds; (b) the base pair G-C or C-G has 3 hydrogen bonds; and (c) the wobble base pair G-U or U-G has 2 weaker hydrogen bonds. See the main text for further explanation.

**Fig 3. A flowchart to show the process of generating the feature vector for a RNA sequence by its structure status composition.**
Given a RNA sequence R (cf. Equation 2), its secondary structure sequence was derived from Vienna RNA software package, as formulated in Equation 4. According to the definition in that package, there are two types of status for each of the nucleotides: unpaired or paired. The former is denoted by a dot “.” and the latter by the symbol “(“or “)”. The left bracket “(“stands for a nucleotide near the 5'-end while the right bracket for the one near the 3'-end. Since the number of different structure elements in the RNA sequence thus obtained is 10 (cf. Equation 5), its n-tuple element composition will contain 10ⁿcomponents (cf. Equation 6). For simplicity, however, shown here is only for the case of n = 2; i.e., the 2-tuple element composition that contains 10² = 100 components formed by different pairs of the most contiguous secondary structure status elements.

**Fig 4. A schematic illustration to show the correlation of structure statuses along a RNA sequence.**
(a) The first-tier correlation reflects the structure-order mode between all the most contiguous nucleotides. (b) The 2nd-tier correlation reflects the structure-order mode between all the second-most contiguous nucleotides. (c) The 3rd-tier correlation reflects the structure-order mode between all the third-most contiguous nucleotides. As we can see, the global or long-range sequence order information of RNA can thus be approximately and indirectly incorporated into the current prediction model as done by the PseAAC approach for proteins [30].

**Fig 5. A graphical illustration to show the performance of different methods by means of the receiver operating characteristic (ROC) curves.**
The areas under the ROC curves, or AUC are 0.93, 0.96, 0.90, and 0.94 for iMcRNA-PseSSC, iMcRNA-ExPseSSC, Triplet-SVM, and MiPred, respectively. See section “Comparison with Other Methods” for further explanation.

**Fig 6. Visualizing the discriminative power with a heat map.**
(a) The discriminative power of the 100 local structure status compositions. The structure statuses marked on the vertical and horizontal axes indicate the first structure status and the second structure status in the local structure status compositions. (b) The discriminative power of the 13 features incorporating the structure-order effect. The λ values are marked on horizontal axis.

**Fig 7. A semi-screenshot to show the top page of the web-server iMcRNA.**
Its website address is at http://bioinformatics.hitsz.edu.cn/iMcRNA/.

**Fig 8. A semi-screenshot to show the output obtained by the web-server.**
See the text for further explanation.

See this image and copyright information in PMC

References

1. Lee Y, Kim M, Han J, Yeom K- H, Lee S, et al. (2004) MicroRNAgenes are transcribed byRNApolymerase II. EMBOJ 23: 4051–4060. - PMC - PubMed
1. Cai X, Hagedorn CH, Cullen BR (2004) Human microRNAs are processed from capped, polyadenylated transcripts that can also function as mRNAs. RNA 10: 1957–1966. - PMC - PubMed
1. Lee Y, Ahn C, Han J, Choi H, Kim J, et al. (2003) The nuclear RNase III Drosha initiates microRNA processing. Nature 425: 415–419. - PubMed
1. Lund E, Guttinger S, Calado A, Dahlberg JE, Kutay U (2004) Nuclear export of microRNA precursors. Science 303: 95–98. - PubMed
1. Yi R, Qin Y, Macara IG, Cullen BR (2003) Exportin-5 mediates the nuclear export of pre-microRNAs and short hairpin RNAs. Genes Dev 17: 3011–3016. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions
Actions
Actions

Associated data

figshare/10.6084/m9.figshare.1289312

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Identification of real microRNA precursors with a pseudo structure status composition approach

Affiliations

Identification of real microRNA precursors with a pseudo structure status composition approach

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Associated data

LinkOut - more resources

Full Text Sources

Other Literature Sources