Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Aug 25;12(1):14515.
doi: 10.1038/s41598-022-18699-3.

Expansion of the RNAStructuromeDB to include secondary structural data spanning the human protein-coding transcriptome

Affiliations

Expansion of the RNAStructuromeDB to include secondary structural data spanning the human protein-coding transcriptome

Warren B Rouse et al. Sci Rep. .

Abstract

RNA plays vital functional roles in almost every component of biology, and these functional roles are often influenced by its folding into secondary and tertiary structures. An important role of RNA secondary structure is in maintaining proper gene regulation; therefore, making accurate predictions of the structures involved in these processes is important. In this study, we have expanded on our previous work that led to the creation of the RNAStructuromeDB. Unlike this previous study that analyzed the human genome at low resolution, we have now scanned the protein-coding human transcriptome at high (single nt) resolution. This provides more robust structure predictions for over 100,000 isoforms of known protein-coding genes. Notably, we also utilize the motif identification tool, ScanFold, to model structures with high propensity for ordered/evolved stability. All data have been uploaded to the RNAStructuromeDB, allowing for easy searching of transcripts, visualization of data tracks (via the Integrative Genomics Viewer or IGV), and download of ScanFold data-including unique highly-ordered motifs. Herein, we provide an example analysis of MAT2A to demonstrate the utility of ScanFold at finding known and novel secondary structures, highlighting regions of potential functionality, and guiding generation of functional hypotheses through use of the data.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Violin plots of various average ScanFold metrics across the transcriptome. (a) The average windowed z-score that is shifted to a slightly negative overall value of −0.43 with outliers of −3.59 and +1.65. (b) The average windowed MFE (ΔG) that is centered around −31 kcal/mol with outliers of −66.68 kcal/mol and −3.68 kcal/mol. (c) The average number of motifs per transcript with z-score ≤ −2. The average was 3.92 and ranged from 0 to 181.
Figure 2
Figure 2
Box and whisker plot of the regional average per nucleotide z-score analysis across the transcriptome. The plot shows an overall decrease from the 5′UTR, to the CDS, to the 3′UTR with values of −0.71, −0.82, and −0.92, respectively.
Figure 3
Figure 3
Example of the MAT2A transcript (ENST00000306434.8) data populated in the updated RNAStructuromeDB IGV window. From top to bottom the tracks have been organized into the annotation or sequence, significant bps or arc diagram, extracted structures with z-scores ≤ −2, ensemble diversity (ED), z-score, and MFE or ΔG. Additional in vivo DMS and SHAPE biochemical probing data (displayed as a heat map), microRNA sites, and RNA binding protein sites were generated and added to the window after ScanFold data acquisition. All track colors except significant bps were changed from their default color of gray to green for ED, blue for positive z-score, red for negative z-score, and red for MFE. The Rfam stem loop A-F structures of the 3′UTR have been annotated by boxed regions for ease of viewing.
Figure 4
Figure 4
ScanFold predicted structural models of the MAT2A 3′UTR. All novel structures are annotated as M# (Motif #) and all known structures are annotated as in the Rfam database (Stem Loop A–E). Each nucleotide of these structures has been annotated with the per nucleotide z-score from the ScanFold final partners file, with red indicating the lowest z-scores (typically ≤ −2), yellow indicating z-scores ≤ −1, blue indicating z-score of 0, and combinations of these colors indicating z-scores that are in between −2, −1, and 0. All base pairs with statistically significant covariation have been annotated with green bars, and the top 20% of in vivo DMS and SHAPE reactivities have been annotated by squares and stars respectively.

Similar articles

Cited by

References

    1. Wan Y, Kertesz M, Spitale RC, Segal E, Chang HY. Understanding the transcriptome through RNA structure. Nat. Rev. Genet. 2011;12:641–655. doi: 10.1038/nrg3049. - DOI - PMC - PubMed
    1. Morris KV, Mattick JS. The rise of regulatory RNA. Nat. Rev. Genet. 2014;15:423–437. doi: 10.1038/nrg3722. - DOI - PMC - PubMed
    1. EP Consortium An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. doi: 10.1038/nature11247. - DOI - PMC - PubMed
    1. Andrzejewska A, Zawadzka M, Pachulska-Wieczorek K. On the way to understanding the interplay between the RNA structure and functions in cells: A genome-wide perspective. Int. J. Mol. Sci. 2020 doi: 10.3390/ijms21186770. - DOI - PMC - PubMed
    1. Mauger DM, et al. mRNA structure regulates protein expression through changes in functional half-life. Proc. Natl. Acad. Sci. USA. 2019;116:24075–24083. doi: 10.1073/pnas.1908052116. - DOI - PMC - PubMed

Publication types