Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Mar 27;18(1):259.
doi: 10.1186/s12864-017-3645-2.

Genome-wide identification of conserved intronic non-coding sequences using a Bayesian segmentation approach

Affiliations

Genome-wide identification of conserved intronic non-coding sequences using a Bayesian segmentation approach

Manjula Algama et al. BMC Genomics. .

Abstract

Background: Computational identification of non-coding RNAs (ncRNAs) is a challenging problem. We describe a genome-wide analysis using Bayesian segmentation to identify intronic elements highly conserved between three evolutionarily distant vertebrate species: human, mouse and zebrafish. We investigate the extent to which these elements include ncRNAs (or conserved domains of ncRNAs) and regulatory sequences.

Results: We identified 655 deeply conserved intronic sequences in a genome-wide analysis. We also performed a pathway-focussed analysis on genes involved in muscle development, detecting 27 intronic elements, of which 22 were not detected in the genome-wide analysis. At least 87% of the genome-wide and 70% of the pathway-focussed elements have existing annotations indicative of conserved RNA secondary structure. The expression of 26 of the pathway-focused elements was examined using RT-PCR, providing confirmation that they include expressed ncRNAs. Consistent with previous studies, these elements are significantly over-represented in the introns of transcription factors.

Conclusions: This study demonstrates a novel, highly effective, Bayesian approach to identifying conserved non-coding sequences. Our results complement previous findings that these sequences are enriched in transcription factors. However, in contrast to previous studies which suggest the majority of conserved sequences are regulatory factor binding sites, the majority of conserved sequences identified using our approach contain evidence of conserved RNA secondary structures, and our laboratory results suggest most are expressed. Functional roles at DNA and RNA levels are not mutually exclusive, and many of our elements possess evidence of both. Moreover, ncRNAs play roles in transcriptional and post-transcriptional regulation, and this may contribute to the over-representation of these elements in introns of transcription factors. We attribute the higher sensitivity of the pathway-focussed analysis compared to the genome-wide analysis to improved alignment quality, suggesting that enhanced genomic alignments may reveal many more conserved intronic sequences.

Keywords: Bayesian modelling; Conserved non-coding sequences; Genome segmentation; Putative functional elements; ncRNA.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
a Most conserved segment classes of lrba gene. Two BED files uploaded to UCSC genome browser correspond to Class 0 (conservation - 71%) and Class 9 (conservation - 75%) segments of zebrafish chromosome 1. The segments in each of Class 0 and Class 9 overlap annotated exons (wide bars) of lrba (ENSDARG00000031108). b An intronic region more conserved than exons. The annotated exon (wide bars) of dachc (ENSDARG00000003142) coincides with the segment in Class 0. The 261 nt long segment at the right end belongs to Class 9, hence is more conserved than the marked exon
Fig. 2
Fig. 2
Number of intronic PFEs identified in each zebrafish chromosome. 655 intronic PFEs were identified in 25 zebrafish chromosomes in total. The highest number of PFEs (98) was detected in zebrafish chromosome 17. 34 PFEs were identified in foxp2 (ENSDARG00000005453) in chromosome 4 and this is the highest number of PFEs found in a single gene followed by 28 PFEs in npas3 (ENSDARG00000079182 – chromosome 17)
Fig. 3
Fig. 3
Venn diagram showing the number of genome-wide intronic PFEs supported by other methods. 94% of the PFEs found in the genome-wide analysis overlapped with the functional elements (predicted or experimentally validated) identified in 4 other databases, EvoFold, fRNAdb, RNAz and DNase I footprints. Most of the PFEs overlapped with entries in EvoFold and there were 47 matches with experimentally identified ncRNA transcripts in fRNAdb
Fig. 4
Fig. 4
WIG profile of the eya1. The top three profiles show, for each sequence position in the human eya1 DNA sequence (UCSC genomic coordinates chr8: 72,127,000 - 72,130,000), the probability that any base at that position belongs to Class 0 (50% conservation), Class 1 (65% conservation), Class 2 (45% conservation) respectively. At any position, the sum of the three profiles is 1. The two rows below the Class 2 profile display the exons (wide bars) and the introns (thin lines) of eya1 recorded in the UCSC and RefSeq collections respectively. Exon boundaries are indicated with red vertical lines. Class 1 corresponds mainly to the mapped exons of eya1, and covers regions of high conservation between human, mouse and zebrafish
Fig. 5
Fig. 5
WIG profile of eya1 PFE 4. This PFE is located within intron 2 of human eya1 (UCSC genomic coordinates chr8: 72,267,549 -72,267,850). The third bar from the top contains single letter amino acid codes corresponding to the actual protein translation phase. At the bottom, the light grey bar indicates a DNase-seq peak track and the green bar shows that there is an EvoFold prediction within the PFE which also suggest that this region is functional
Fig. 6
Fig. 6
Venn diagram showing the number of pathway-focussed PFEs supported by other methods. 88% of the PFEs found in the pathway-focussed analysis overlapped with the functional elements (predicted or experimentally validated) identified in 4 other databases, EvoFold, fRNAdb, RNAz and DNase I footprints. Most of the PFEs overlapped with entries in either EvoFold or DNase I footprints and there were 3 matches with experimentally identified ncRNA transcripts in fRNAdb
Fig. 7
Fig. 7
RT-PCR of intronic putative functional elements (PFEs) showing their presence or absence in 24 hpf zebrafish cDNA pools. Each gene has between 1 and 7 PFEs. Exon lane contains an exonic region, spanning an intron, of the gene of interest. Intron lane represents a randomly selected intronic region that was not identified as a PFE. Primers were designed to amplify products with sizes ranging 57-274 bp. The ladder bands shown are 100, 200 and 300 bp. The gels with the two bands of the ladder showing are the 100 and 200 bp bands. The panel insert is a cDNA control. β-actin (exonic spanning an intron) and RNA (RNA used as a template) lanes demonstrate there is no genomic contamination. No template lane rules out contamination of other PCR reagents
Fig. 8
Fig. 8
RT-PCR of muscle expressed genes not containing PFEs. Exonic sequence amplification is evident for 15 of the genes but only 1 (wnt7aa) has amplification of an intronic sequence. Primers were designed to amplify products with sizes ranging 100-638 bp. Lane 1 for each gel contains a 100 bp ladder. The negative lanes are no template controls to rule out genomic DNA contamination

References

    1. Khalil AM, Guttman M, Huarte M, Garber M, Raj A, Rivea Morales D, et al. Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc Natl Acad Sci U S A. 2009;106:11667–72. doi: 10.1073/pnas.0904715106. - DOI - PMC - PubMed
    1. Koziol MJ, Rinn JL. RNA traffic control of chromatin complexes. Curr Opin Genet Dev. 2010;20:142–8. doi: 10.1016/j.gde.2010.03.003. - DOI - PMC - PubMed
    1. Rinn JL, Kertesz M, Wang JK, Squazzo SL, Xu X, Brugmann SA, et al. Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell. 2007;129:1311–23. doi: 10.1016/j.cell.2007.05.022. - DOI - PMC - PubMed
    1. Corey DR. Regulating mammalian transcription with RNA. Trends Biochem Sci. 2005;30:655–8. doi: 10.1016/j.tibs.2005.09.007. - DOI - PubMed
    1. Mattick JS, Makunin IV. Small regulatory RNAs in mammals. Hum Mol Genet. 2005;14:R121–32. doi: 10.1093/hmg/ddi101. - DOI - PubMed

Publication types

LinkOut - more resources