Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Dec 18:8:478.
doi: 10.1186/1471-2105-8-478.

Ab initio identification of human microRNAs based on structure motifs

Affiliations

Ab initio identification of human microRNAs based on structure motifs

Markus Brameier et al. BMC Bioinformatics. .

Abstract

Background: MicroRNAs (miRNAs) are short, non-coding RNA molecules that are directly involved in post-transcriptional regulation of gene expression. The mature miRNA sequence binds to more or less specific target sites on the mRNA. Both their small size and sequence specificity make the detection of completely new miRNAs a challenging task. This cannot be based on sequence information alone, but requires structure information about the miRNA precursor. Unlike comparative genomics approaches, ab initio approaches are able to discover species-specific miRNAs without known sequence homology.

Results: MiRPred is a novel method for ab initio prediction of miRNAs by genome scanning that only relies on (predicted) secondary structure to distinguish miRNA precursors from other similar-sized segments of the human genome. We apply a machine learning technique, called linear genetic programming, to develop special classifier programs which include multiple regular expressions (motifs) matched against the secondary structure sequence. Special attention is paid to scanning issues. The classifiers are trained on fixed-length sequences as these occur when shifting a window in regular steps over a genome region. Various statistical and empirical evidence is collected to validate the correctness of and increase confidence in the predicted structures. Among other things, we propose a new criterion to select miRNA candidates with a higher stability of folding that is based on the number of matching windows around their genome location. An ensemble of 16 motif-based classifiers achieves 99.9 percent specificity with sensitivity remaining on an acceptable high level when requiring all classifiers to agree on a positive decision. A low false positive rate is considered more important than a low false negative rate, when searching larger genome regions for unknown miRNAs. 117 new miRNAs have been predicted close to known miRNAs on human chromosome 19. All candidate structures match the free energy distribution of miRNA precursors which is significantly shifted towards lower free energies. We employed a human EST library and found that around 75 percent of the candidate sequences are likely to be transcribed, with around 35 percent located in introns.

Conclusion: Our motif finding method is at least competitive to state-of-the-art feature-based methods for ab initio miRNA discovery. In doing so, it requires less previous knowledge about miRNA precursor structures while programs and motifs allow a more straightforward interpretation and extraction of the acquired knowledge.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Typical hairpin structure and corresponding secondary structure sequence of miRNA precursor as predicted by RNAfold [22]. Base pairings are represented by complementary parentheses and non-paring bases by dots. Human miRNA mir-24-1 is shown.
Figure 2
Figure 2
Example of shifted sequence window and corresponding secondary structure sequences. Some substructures are more stable than others.
Figure 3
Figure 3
Performance of the ensemble classifier including 16 individual classifiers for different voting thresholds, i.e., minimum numbers of required positive decisions. The maximum threshold (16/16) achieves a specificity of above 99.9 percent on an independent test set of randomly selected sequences while still maintaining a sensitivity of above 82 percent on all human miRNAs (also used for training). Majority voting (8/16) shows more balanced values, both higher than found for the individual classifiers (1/1).
Figure 4
Figure 4
Free energy distributions of sequence windows (central 70 nt folding). Frequency distribution (in percent) of 173 predicted miRNAs on chromosome 19 matches the distribution of all 474 known human miRNAs (miRBase 9.0). The normal distribution of all tested 88,808 structures is significantly shifted towards higher free energies with mean around -15 kcal/mol, compared to about -30 for miRNAs. Structures with lower free energy, especially below -30, are more likely miRNAs. Energies are rounded to integers.
Figure 5
Figure 5
(left) Frequency distribution (in percent) over the number of directly successive sequence windows whose corresponding structure is a full match, i.e., is predicted positive by all 16 classifiers. Higher number of matches for known miRNAs indicates higher stability of folding. (right) Distribution counting also partial matches with the proportion of positive predictions being < 1 and within a range of three window shifts before and after a full match. Matches are averaged and rounded to integers.
Figure 6
Figure 6
Free energy plotted against number of successive matches by the scanning window. Means over energy bins (highlighted in blue) reveal weak, but pronounced correlation between lower free energy and higher number of matches. Preceding and succeeding partial matches are included.
Figure 7
Figure 7
Frequency distribution (in percent) of expression-sequence matches over the window positions. 100 nt windows centered around all 474 known human miRNAs and all 117 unknown predicted miRNAs. Only regular expressions that match mostly positive structures are used. Loop region (approximately central) and flanking regions are less matched than stem sequences.

Similar articles

Cited by

References

    1. Bartel D. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell. 2004;116:281–297. doi: 10.1016/S0092-8674(04)00045-5. - DOI - PubMed
    1. Ambros V. The functions of animal microRNAs. Nature. 2004;431:350–355. doi: 10.1038/nature02871. - DOI - PubMed
    1. He L, Hannon G. MicroRNAs: small RNAs with a big role in gene regulation. Nat Rev Genet. 2004;5:522–531. doi: 10.1038/nrg1379. - DOI - PubMed
    1. Weber M. New human and mouse microRNA genes found by homology search. FEBS J. 2005;272:59–73. doi: 10.1111/j.1432-1033.2004.04389.x. - DOI - PubMed
    1. Legendre M, Lambert A, Gautheret D. Profile-based detection of microRNA precursors in animal genomes. Bioinformatics. 2005;21:841–845. doi: 10.1093/bioinformatics/bti073. - DOI - PubMed

Publication types