Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Apr;2(4):e33.
doi: 10.1371/journal.pcbi.0020033. Epub 2006 Apr 21.

Identification and classification of conserved RNA secondary structures in the human genome

Affiliations

Identification and classification of conserved RNA secondary structures in the human genome

Jakob Skou Pedersen et al. PLoS Comput Biol. 2006 Apr.

Abstract

The discoveries of microRNAs and riboswitches, among others, have shown functional RNAs to be biologically more important and genomically more prevalent than previously anticipated. We have developed a general comparative genomics method based on phylogenetic stochastic context-free grammars for identifying functional RNAs encoded in the human genome and used it to survey an eight-way genome-wide alignment of the human, chimpanzee, mouse, rat, dog, chicken, zebra-fish, and puffer-fish genomes for deeply conserved functional RNAs. At a loose threshold for acceptance, this search resulted in a set of 48,479 candidate RNA structures. This screen finds a large number of known functional RNAs, including 195 miRNAs, 62 histone 3'UTR stem loops, and various types of known genetic recoding elements. Among the highest-scoring new predictions are 169 new miRNA candidates, as well as new candidate selenocysteine insertion sites, RNA editing hairpins, RNAs involved in transcript auto regulation, and many folds that form singletons or small functional RNA families of completely unknown function. While the rate of false positives in the overall set is difficult to estimate and is likely to be substantial, the results nevertheless provide evidence for many new human functional RNAs and present specific predictions to facilitate their further characterization.

PubMed Disclaimer

Conflict of interest statement

Competing interests. The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Outline of EvoFold Prediction Method
(A) Schematic representation of human genome and conserved elements. The conserved elements define the input alignments. (B) Segment of eight-way genomic alignment. (C) The SCFG of the fRNA model defines a distribution over all possible secondary-structure annotations. One of the many possible secondary structures is shown in parenthesis format. Substitutions in pairing regions of the alignment are color-coded relative to human: compensatory double substitutions are green, and compatible single substitutions are blue. (D) Color-coded fold corresponding to the secondary-structure annotation of the alignment. (E) Two phylogenetic models are used to evaluate the possible secondary-structure annotations: unpaired columns are evaluated using a single-nucleotide phylogenetic model. Paired columns are combined and evaluated using a di-nucleotide phylogenetic model. Horizontal branch lengths reflect the expected number of substitutions.
Figure 2
Figure 2. Breakdown of Types of RNA Folds Detected in the Human Genome Based on True Positive Estimates
See Materials and Methods, Validation section. Folds are classified according to (A) size (number of pairing bases), (B) location in the genome, and (C) shape. The relative abundance of each class of folds is indicated. For (B), also shown is the genomic span of the conserved segments relative to their genomic location, for comparison.
Figure 3
Figure 3. Coding Hairpin near Selenocysteine Insertion Site
(A) Gene structure, EvoFold predictions, and conservation around the selenocysteine insertion site of selenoprotein T (SELT). The pairing regions of the hairpin are shown in dark green and can be seen to start only eight bases downstream of the UGA insertion site (indicated by *). Arrows indicate direction of transcription. (B) Annotated segment of eight-way alignment spanning the predicted hairpin. SS anno, secondary-structure annotation in parenthesis format (matching parentheses indicate pairs and periods indicate unpaired regions); pair symbol, pairing columns are assigned identical symbols to facilitate navigation; Score, position-specific scores (0–9), which indicate confidence in secondary-structure annotation. Substitutions in predicted pairs are color-coded relative to the human sequence: green is a compensatory double substitution, blue is a compatible single substitution, and red is a noncompatible substitution. (C) Depiction of hairpin, which is shown with T instead of U to facilitate comparison with the genomic sequences. Pairs are color-coded by presence of substitutions in the eight-way alignment (see b).
Figure 4
Figure 4. Candidate Substrate for A-to-I Editing
(A) Gene structure, EvoFold predictions, cDNAs, conservation, and eight-way alignment are shown at the start of the second exon of the UBE1C gene. The predicted hairpin is shown in parenthesis format and can be seen to overlap the intron–exon boundary. The red box highlights a position where the genomic sequence contains an A and a cDNA contains a G. The orange bar and label “4” indicate that up to four extra bases are present in this loop location in the indicated species. (B) Depiction of hairpin (see Figure 3B for color legend) with indication of the potential site of ADAR editing (A-to-I). (C) Which would lead to a lysine to arginine amino acid change.
Figure 5
Figure 5. 5′UTR miRNA-Like Hairpin and Coding Hairpin in Gene (DGCR8) Involved in miRNA Processing
(A) Gene structure and EvoFold predictions are shown around the first exon of DGCR8. (B) Annotated segment of the eight-way alignment spanning the long, miRNA-like 5′UTR-hairpin (see Figure 3B for legend). (C) Depiction of folds.
Figure 6
Figure 6. Clover-Shaped Fold Predictions
(A) Gene structure, EvoFold predictions, and cDNAs around the end of the gene ZNF207. The 3′UTR and the intron of an alternative splice variant harbor high-scoring clover-shaped fold predictions. (B) Annotated segment of eight-way alignment spanning the 3′UTR fold (see Figure 3B for legend). (C) Depictions of 3′UTR fold (left) and intronic fold (right). (D) Annotated alignment of human primary sequences of 3′UTR and intronic folds. The alignment is annotated with the secondary structures of the folds and substitution differences in corresponding pairs are color-coded (see Figure 3B for color legend).

References

    1. Eddy SR. Non-coding RNA genes and the modern RNA world. Nat Rev Genet. 2001;2:919–929. - PubMed
    1. Bompfünewerer AF, Flamm C, Fried C, Fritzsch G, Hofacker IL, et al. Evolutionary patterns of non-coding RNAs. Theor Biosci. 2004;123:301–369. - PubMed
    1. Mattick JS, Makunin IV. Small regulatory RNAs in mammals. Hum Mol Genet 14 Spec No. 2005;1:R121–R132. - PubMed
    1. Brosius J. The contribution of RNAs and retroposition to evolutionary novelties. Genetica. 2003;118:99–116. - PubMed
    1. Rivas E, Eddy SR. Secondary structure alone is generally not statistically significant for the detection of noncoding rnas. Bioinformatics. 2000;16:583–605. - PubMed

Publication types