Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Mar 30;10(3):e0121501.
doi: 10.1371/journal.pone.0121501. eCollection 2015.

Identification of real microRNA precursors with a pseudo structure status composition approach

Affiliations

Identification of real microRNA precursors with a pseudo structure status composition approach

Bin Liu et al. PLoS One. .

Abstract

Containing about 22 nucleotides, a micro RNA (abbreviated miRNA) is a small non-coding RNA molecule, functioning in transcriptional and post-transcriptional regulation of gene expression. The human genome may encode over 1000 miRNAs. Albeit poorly characterized, miRNAs are widely deemed as important regulators of biological processes. Aberrant expression of miRNAs has been observed in many cancers and other disease states, indicating they are deeply implicated with these diseases, particularly in carcinogenesis. Therefore, it is important for both basic research and miRNA-based therapy to discriminate the real pre-miRNAs from the false ones (such as hairpin sequences with similar stem-loops). Particularly, with the avalanche of RNA sequences generated in the postgenomic age, it is highly desired to develop computational sequence-based methods in this regard. Here two new predictors, called "iMcRNA-PseSSC" and "iMcRNA-ExPseSSC", were proposed for identifying the human pre-microRNAs by incorporating the global or long-range structure-order information using a way quite similar to the pseudo amino acid composition approach. Rigorous cross-validations on a much larger and more stringent newly constructed benchmark dataset showed that the two new predictors (accessible at http://bioinformatics.hitsz.edu.cn/iMcRNA/) outperformed or were highly comparable with the best existing predictors in this area.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. An illustration to show biogenesis of miRNAs and model of miRNA-mediated translational repression or mRNA degradation.
MiRNA genes are transcribed by RNA polymerase II [2,90], resulting in the primary transcripts termed as pri-miRNAs, which are typically 60–70 nucleotides. The pri-miRNAs are processed by the enzyme Drosha to release the hairpin-shaped intermediates (pre-miRNAs) [3], followed by being exported into the cytoplasm by Exportin V and Ran-GTP cofactor [–6], and then cleaved by the enzyme Dicer to yield miRNA/miRNA* duplexes [–11].
Fig 2
Fig 2. Illustration to show the 6 structure statuses of paired nucleic acid residues.
Note that the nucleotide near 5’ end is different with the one near 3’end: (a) the base pairs A-U or U-A has 2 hydrogen bonds; (b) the base pair G-C or C-G has 3 hydrogen bonds; and (c) the wobble base pair G-U or U-G has 2 weaker hydrogen bonds. See the main text for further explanation.
Fig 3
Fig 3. A flowchart to show the process of generating the feature vector for a RNA sequence by its structure status composition.
Given a RNA sequence R (cf. Equation 2), its secondary structure sequence was derived from Vienna RNA software package, as formulated in Equation 4. According to the definition in that package, there are two types of status for each of the nucleotides: unpaired or paired. The former is denoted by a dot “.” and the latter by the symbol “(“or “)”. The left bracket “(“stands for a nucleotide near the 5'-end while the right bracket for the one near the 3'-end. Since the number of different structure elements in the RNA sequence thus obtained is 10 (cf. Equation 5), its n-tuple element composition will contain 10ncomponents (cf. Equation 6). For simplicity, however, shown here is only for the case of n = 2; i.e., the 2-tuple element composition that contains 102 = 100 components formed by different pairs of the most contiguous secondary structure status elements.
Fig 4
Fig 4. A schematic illustration to show the correlation of structure statuses along a RNA sequence.
(a) The first-tier correlation reflects the structure-order mode between all the most contiguous nucleotides. (b) The 2nd-tier correlation reflects the structure-order mode between all the second-most contiguous nucleotides. (c) The 3rd-tier correlation reflects the structure-order mode between all the third-most contiguous nucleotides. As we can see, the global or long-range sequence order information of RNA can thus be approximately and indirectly incorporated into the current prediction model as done by the PseAAC approach for proteins [30].
Fig 5
Fig 5. A graphical illustration to show the performance of different methods by means of the receiver operating characteristic (ROC) curves.
The areas under the ROC curves, or AUC are 0.93, 0.96, 0.90, and 0.94 for iMcRNA-PseSSC, iMcRNA-ExPseSSC, Triplet-SVM, and MiPred, respectively. See section “Comparison with Other Methods” for further explanation.
Fig 6
Fig 6. Visualizing the discriminative power with a heat map.
(a) The discriminative power of the 100 local structure status compositions. The structure statuses marked on the vertical and horizontal axes indicate the first structure status and the second structure status in the local structure status compositions. (b) The discriminative power of the 13 features incorporating the structure-order effect. The λ values are marked on horizontal axis.
Fig 7
Fig 7. A semi-screenshot to show the top page of the web-server iMcRNA.
Its website address is at http://bioinformatics.hitsz.edu.cn/iMcRNA/.
Fig 8
Fig 8. A semi-screenshot to show the output obtained by the web-server.
See the text for further explanation.

References

    1. Lee Y, Kim M, Han J, Yeom K- H, Lee S, et al. (2004) MicroRNAgenes are transcribed byRNApolymerase II. EMBOJ 23: 4051–4060. - PMC - PubMed
    1. Cai X, Hagedorn CH, Cullen BR (2004) Human microRNAs are processed from capped, polyadenylated transcripts that can also function as mRNAs. RNA 10: 1957–1966. - PMC - PubMed
    1. Lee Y, Ahn C, Han J, Choi H, Kim J, et al. (2003) The nuclear RNase III Drosha initiates microRNA processing. Nature 425: 415–419. - PubMed
    1. Lund E, Guttinger S, Calado A, Dahlberg JE, Kutay U (2004) Nuclear export of microRNA precursors. Science 303: 95–98. - PubMed
    1. Yi R, Qin Y, Macara IG, Cullen BR (2003) Exportin-5 mediates the nuclear export of pre-microRNAs and short hairpin RNAs. Genes Dev 17: 3011–3016. - PMC - PubMed

Publication types

LinkOut - more resources