Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2004 Sep 24;32(16):4925-36.
doi: 10.1093/nar/gkh839. Print 2004.

A comparative method for finding and folding RNA secondary structures within protein-coding regions

Affiliations
Comparative Study

A comparative method for finding and folding RNA secondary structures within protein-coding regions

Jakob Skou Pedersen et al. Nucleic Acids Res. .

Abstract

Existing computational methods for RNA secondary-structure prediction tacitly assume RNA to only encode functional RNA structures. However, experimental studies have revealed that some RNA sequences, e.g. compact viral genomes, can simultaneously encode functional RNA structures as well as proteins, and evidence is accumulating that this phenomenon may also be found in Eukaryotes. We here present the first comparative method, called RNA-DECODER, which explicitly takes the known protein-coding context of an RNA-sequence alignment into account in order to predict evolutionarily conserved secondary-structure elements, which may span both coding and non-coding regions. RNA-DECODER employs a stochastic context-free grammar together with a set of carefully devised phylogenetic substitution-models, which can disentangle and evaluate the different kinds of overlapping evolutionary constraints which arise. We show that RNA-DECODER's parameters can be automatically trained to successfully fold known secondary structures within the HCV genome. We scan the genomes of HCV and polio virus for conserved secondary-structure elements, and analyze performance as a function of available evolutionary information. On known secondary structures, RNA-DECODER shows a sensitivity similar to the programs MFOLD, PFOLD and RNAALIFOLD. When scanning the entire genomes of HCV and polio virus for structure elements, RNA-DECODER's results indicate a markedly higher specificity than MFOLD, PFOLD and RNAALIFOLD.

PubMed Disclaimer

Figures

Figure 1
Figure 1
States and transitions of the high-level sub-grammar. The different state types (see abbreviation in parenthesis) are explained in the text and are indicated by the different shapes. States of type bifurcate have a bifurcating transition leading both to a left (l) and a right (r) state. Any derivation tree of the grammar has to start in the begin state. The start states of the non-structural and structural sub-grammars simultaneously act as terminals for this high-level grammar and are depicted as double-edged octagons.
Figure 2
Figure 2
States and transitions of the non-structural (left) and the structural (right) sub-grammar. States which read terminals are depicted as squares. See Figure 1 for the high-level sub-grammar and more information.
Figure 3
Figure 3
Pairing predictions along the HCV reference sequence excluding the 3′ UTR. RNA-Decoder, Pfold and RNAalifold were used on both the HCV 1a set and the HCV 1a & 1b set, and the pairing probabilities for the alignments were then projected onto the reference sequence. Mfold was directly used on the reference sequence. Please refer to the text for more information on how the scan of the HCV genome was performed. The long contiguous protein-coding region starts at position 1 and ends at position 9032 (i.e. the stop codon is at positions 9033–9035). The five known secondary structures on which RNA-Decoder was trained lie between positions 8678 and 9018. The RNA structures annotated in the coding region and the 5′UTR are from Refs. (7) and (44), respectively. A recent computational survey of RNA structures in Flaviviridae (47) predicts some new coding elements. Several of these overlap the predictions made by RNA-Decoder on the 1a & 1b set.
Figure 4
Figure 4
Pairing predictions along the polio reference sequence. RNA-Decoder, Pfold and RNAalifold were used on the polio alignment and the pairing probabilities for the alignment were then projected onto the reference sequence. Mfold was directly used on the reference sequence. Please refer to the text for more information on how the scan of the polio genome was performed. The protein-coding region starts at position 1 and ends at position 6639 (i.e. the stop codon is at positions 6640–6642). We have only been able to recover precise annotations of a single experimentally verified RNA structure from the literature (46). However, several elements have been inferred by homology to other vira and analysis of compensatory mutations in both the 5′UTR and the 3′UTR [(48) and references therein].
Figure 5
Figure 5
Pairing probability along structure 4 for different numbers of sequences in the input alignment. Structure 4 consists of a hairpin with a single bulge whose annotated base-pairing positions are indicated by little black boxes along the x-axis. The left figure shows the pairing probability using the reference sequence as only input (TTL = 0, light gray), using the reference sequence and two of its closest neighbor sequences as input (TTL = 0.19, medium gray) and using all eight sequences of the HCV 1a set (TTL = 0.59, black). For comparison, the right figure shows the pairing probability for structure 4 using all HCV 1a & 1b sequences (TTL = 9.84). The open boxes (pos. 14 and 25) indicate alignment columns for which a fraction of the sequences (indicated by the height of the box) do not form consensus base pairs (see caption of Table 1).

Similar articles

Cited by

References

    1. Storz G. (2002) An expanding universe of noncoding RNAs. Science, 296, 1260–1263. - PubMed
    1. Eddy S.R. (2001) Non-coding RNA genes and the modern RNA world. Nature Rev. Genet., 2, 919–929. - PubMed
    1. Diwa A., Bricker,A.L., Jain,C. and Belasco,J.G. (2000) An evolutionarily conserved RNA stem–loop functions as a sensor that directs feedback regulation of RNase E gene expression. Genes Dev., 14, 1249–1260. - PMC - PubMed
    1. Xiang W., Paul,A.V. and Wimmer,E. (1997) RNA signals in entero- and rhinovirus genome replication. Semin. Virol., 8, 256–273.
    1. Huthoff H. and Berkhout,B. (2002) Multiple secondary structure rearrangements during HIV-1 RNA dimerization. Biochemistry, 41, 10439–10445. - PubMed

Publication types