Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Mar 21;50(5):2452-2463.
doi: 10.1093/nar/gkac067.

Does rapid sequence divergence preclude RNA structure conservation in vertebrates?

Affiliations

Does rapid sequence divergence preclude RNA structure conservation in vertebrates?

Stefan E Seemann et al. Nucleic Acids Res. .

Abstract

Accelerated evolution of any portion of the genome is of significant interest, potentially signaling positive selection of phenotypic traits and adaptation. Accelerated evolution remains understudied for structured RNAs, despite the fact that an RNA's structure is often key to its function. RNA structures are typically characterized by compensatory (structure-preserving) basepair changes that are unexpected given the underlying sequence variation, i.e., they have evolved through negative selection on structure. We address the question of how fast the primary sequence of an RNA can change through evolution while conserving its structure. Specifically, we consider predicted and known structures in vertebrate genomes. After careful control of false discovery rates, we obtain 13 de novo structures (and three known Rfam structures) that we predict to have rapidly evolving sequences-defined as structures where the primary sequences of human and mouse have diverged at least twice as fast (1.5 times for Rfam) as nearby neutrally evolving sequences. Two of the three known structures function in translation inhibition related to infection and immune response. We conclude that rapid sequence divergence does not preclude RNA structure conservation in vertebrates, although these events are relatively rare.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Description of local neutral model and selection ratio’s correlation to covariates. (A) The local neutral model is defined by neutrally evolved ancestral repeats (AR; blue boxes) that are local to a feature (e.g. conserved RNA structure CRS; green box). Local (dark blue boxes) are the first 1000 positions of concatenated ARs around a feature. The pairwise sequence distance (d) is calculated between human and mouse for both the local neutral model of a CRS (dLN(CRS)) and the CRS itself (dF(CRS)). The type of selection of features is estimated by the selection ratio (SR). (B) Distribution of the sequence distance along human chromosome 1 in 100 kb windows for both CRSs (dF(CRS)) and their corresponding local null (dLN(CRS)) illustrates the linkage of the mutation rates on large scales. Gray vertical line indicates the centromere position. (C) Scatterplot of sequence distance of CRSs (dF(CRS)) and sequence distance of corresponding local null (dLN(CRS)). Points above the blue line are CRSs under negative selection and points below the red line are CRSs with rapidly evolving sequence based on our threshold definition (SR < 0.5 and SR > 2 respectively). (D) The distribution of selection ratio for CRSs (SR(CRS)) and individual ARs (SR(AR)). For SR(AR) distribution, we show one of the 10 independent samplings of ARs from the FDR(SR) calculation. As expected SR(AR) is distributed around one (note the color of SR(AR) is the same as for neutral selection in panel A). Dashed vertical lines mark our thresholds for structures under negative selection (SR < 0.5) and structure with rapidly evolving sequence (SR > 2). (E–G) Correlation of covariates of conserved structures to SR(CRS) is shown as 2d density estimation and linear regression (only SR lower than 2). (E, F) are measured from the 17 species structure alignments. The Spearman’s correlation coefficients are (E) ρ = −0.76, (F) ρ = 0.13 and (G) ρ = −0.08.
Figure 2.
Figure 2.
False discovery rate of the selection ratio, i.e. FDR(SR), estimation of de novo structures. Structures and sampled ancestral repeats (null model of neutral selection) were divided into ranges of two covariates: ‘de-gapped’ human-rhesus macaque-mouse alignment length [bp] and human G+C content. All pairwise combinations of length and G+C content ranges were applied for FDR(SR) estimation. For viewing the impact of the covariates on FDR(SR) they are separately viewed. (A) Ranges of alignment length [bp] (0–100],(100–150],(150–200],(200–300],(300–500]. (B) Ranges of human G+C content [0–0.25], (0.25–0.30], (0.30–0.35], (0.35–0.40], (0.40–0.45], (0.45–0.50], (0.50–0.55], (0.55–0.60], (0.60–1.00]. A generalized additive model (GAM) with restricted maximum likelihood (REML) parameter estimation is fitted to the data in each covariate range. As our focus is on CRSs with rapidly evolving sequence, only CRSs with SR > 2 are shown as points (621 CRSs with SR > 4.25 are not shown). FDR(SR) was estimated inside different ranges of SR: 41 half-open intervals of width 0.1 from (0.0–0.1] to (3.9–4.0]. One of the 10 independent samplings of ARs is shown. Supplementary Figure S11 shows the combined plot of both covariates.
Figure 3.
Figure 3.
Examples of de novo structures with rapidly evolving sequence. Conservation patterns indicated in RNA secondary structures are based on 100 species structure based alignments after removing alignment columns with formula image of gaps and sequences with formula image of gaps (drawing by R2R (37)). (A) M1716264 overlaps the long ncRNA lnc-CLEC18B-44 (hg38/chr16:73609195-73609593) and has the following properties: SR=2.9, FDR(SR)=0.15, GC(human) = 0.36, SI(17 species)=60.2%, Length(17 species) = 542 bp, SCI(17 species) = 0.13. (B) M0770120 overlaps the 3’-UTR of mRNA TIPARP (hg38/chr3:156705568–156705927) and has the following properties: SR= 2.9, FDR(SR)= 0.15, GC(human) = 0.36, SI(17 species) = 65.4%, Length(17 species) = 387 bp, SCI(17 species) = 0.12. (C) M0367414 is intronic of the long ncRNA LINC00871 (hg38/chr14:45954247–45954488) and has the following properties: SR= 3.3, FDR(SR)= 0.09, GC(human) = 0.23, SI(17 species) = 51.9%, Length(17 species) = 271 bp, SCI(17 species) = 0.16. (D) M2048567 overlaps the processed pseudogene AC108673.1 (hg38/chr3:129046313–129046527) and has the following properties: SR= 3.6, FDR(SR)= 0.20, GC(human) = 0.71, SI(17 species) = 64.0%, Length(17 species) = 257 bp, SCI(17 species) = 0.30. The fitted RNA motif HL_35442.1 (36) contains a conserved trans oriented Sugar-Edge Watson–Crick basepair with both isosteric basepairs G–A and A–A occurring in the alignment, and was only found in 2% of randomly selected structures (Supplementary Methods S3).
Figure 4.
Figure 4.
Signals of structure conservation in de novo and known secondary structures. (A) Structure conservation index (SCI) calculates the consistency between the structures of the individual sequences and the consensus structure in terms of minimum free energy (MFE). (B) Fraction of covarying basepairs in the annotated consensus structure. (C) Alignment power is the fraction of basepairs expected to show a significant covariation signal as calculated by R-scape. (D) Fraction of basepairs that show a significant covariation signal in the two-set statistical test (one test for annotated basepairs (bp), another for all other pairs) by R-scape (E < 0.05). We distinguish de novo structures with rapidly evolving sequence (rapid CRS: SR > 2 and FDR(SR)≤0.2), under negative selection (neg CRS: SR < 0.5 and FDR(SR)≤0.2), and other (other CRS). For comparison, Rfam (version 14.0) seed alignments (Rfam), their subset of vertebrate sequences (Rfam vert), and CMfinder predicted structure-based alignments of the human sequences in Rfam seed alignments and their homologous sequences extracted from the human (hg38) centered 100-way vertebrate MULTIZ alignments (Rfam CMf) were analyzed. The SCI in (A) has also been calculated for human (hg18) centered 17-way vertebrate UCSC Genome Browser alignments (MULTIZ) overlapping the human sequence of CRSs, and the human (hg38) centered 100-way MULTIZ overlapping the human sequences in Rfam seed alignments, illustrating the improved structure conservation signal in the structure-based alignments of CRSs. In (A) and (B) all 2,791 Rfam seed alignments and 831 vertebrate alignments are shown, whereas in (C) and (D) R-scape analyzed only 1966 seed alignments and 712 vertebrate alignments as for the others (including mir-657) the covariation in the alignment is too small (mostly due to too few sequences). The Rfam families IRES Hsp70 (RF00495), IFNγ (RF00259) and mir-657 (RF00988) with rapidly evolving sequence are indicated. If not then their values are zero, e.g. R-scape estimates expected and observed significantly covarying basepairs to be zero in the Rfam and Rfam vert alignments for all three families. mir-657 has formula image significant covarying bps (5 out of 9 bp) and, hence, is out of y-axis limits in (D). The median values are marked as horizontal lines. All three Rfam families with rapidly evolving sequence have exclusively vertebrate sequences in their seed alignments, hence Rfam and Rfam vert values are the same for them: IRES Hsp70 – 12 sequences from primates and 2 from cattle (see Supplementary Figure S8), IFNγ – 4 from primates and 1 from cattle, and mir-657 – 2 from primates.

References

    1. Washietl S., Hofacker I., Stadler P.. Fast and reliable prediction of noncoding RNAs. Proc. Natl. Acad. Sci. U.S.A. 2005; 102:2454–2459. - PMC - PubMed
    1. Pedersen J., Bejerano G., Siepel A., Rosenbloom K., Lindblad-Toh K., Lander E., Kent J., Miller W., Haussler D.. Identification and classification of conserved RNA secondary structures in the human genome. PLoS Comput. Biol. 2006; 2:e33. - PMC - PubMed
    1. Yao Z., Weinberg Z., Ruzzo W.. CMfinder–a covariance model based RNA motif finding algorithm. Bioinformatics. 2006; 22:445–452. - PubMed
    1. Washietl S., Hofacker I., Lukasser M., Huttenhofer A., Stadler P.. Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome. Nat. Biotechnol. 2005; 23:1383–1390. - PubMed
    1. Torarinsson E., Yao Z., Wiklund E., Bramsen J., Hansen C., Kjems J., Tommerup N., Ruzzo W., Gorodkin J.. Comparative genomics beyond sequence-based alignments: RNA structures in the ENCODE regions. Genome Res. 2008; 18:242–251. - PMC - PubMed

Publication types