Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Jun;17(6):940-6.
doi: 10.1101/gr.5602807.

Detection of DNA structural motifs in functional genomic elements

Affiliations

Detection of DNA structural motifs in functional genomic elements

Jason A Greenbaum et al. Genome Res. 2007 Jun.

Abstract

The completion of the human genome project has fueled the search for regulatory elements by a variety of different approaches. Many successful analyses have focused on examining primary DNA sequence and/or chromatin structure. However, it has been difficult to detect common sequence motifs within the feature of chromatin structure most closely associated with regulatory elements, DNase I hypersensitive sites (DHSs). Considering just the nucleotide sequence and/or the chromatin structure of regulatory elements may neglect a critical feature of what is recognized by the regulatory machinery--DNA structure. We introduce a new computational method to detect common DNA structural motifs in a large collection of DHSs that are found in the ENCODE regions of the human genome. We show that DHSs have common DNA structural motifs that show no apparent sequence consensus. One such structural motif is much more highly enriched in experimentally identified DHSs that are in CpG islands and near transcription start sites (TSSs), compared to DHSs not in CpG islands and farther from TSSs, suggesting that DNA structural motifs may participate in the formation of functional regulatory elements. We propose that studies of the conservation of DNA structure, independent of sequence conservation, will provide new information about the link between the nucleotide sequence of a DNA molecule and its experimentally demonstrated function.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Histogram of alignment scores of shuffled versus DHS sequences. The CORCSScrU program was run 3204 times until convergence, using either real or shuffled sequences from the MPSS DHS data set. The window size was preset to 12. The resulting alignment scores were binned and are represented here as two histograms. The alignment scores for the real sequences are generally higher than those from the shuffled sequences. A Kolmogorov-Smirnov test indicates that these two distributions are significantly different (p = 10−17).
Figure 2.
Figure 2.
Analysis of representative high-scoring CORCS. (A–C) Heat maps of CORCS found in (A) the MPSS data set and (B) the Union data set using the discrete sampler, and in (C) the Union data set using the continuous sampler. The X-axis represents each position in the CORCS and the Y-axis represents cleavage value bins. Dark blue cells in the heat map indicate no cleavage values for bin Y at position X are present, whereas red cells indicate a large proportion of the cleavage values for that column. (D–F) Mean predicted hydroxyl radical cleavage patterns of CORCS found in (D) the MPSS data set and (E) the Union data set using the discrete sampler, and in (F) the Union data set using the continuous sampler. (G–I) Sequence logos of CORCS found in (G) the MPSS data set and in (H) the Union data set using the discrete sampler, and in (I) the Union data set using the continuous sampler.
Figure 3.
Figure 3.
Conservation of nucleotide sequence versus structure in CORCS2. Plotted here is the normalized information present at each nucleotide position in CORCS2 for the hydroxyl radical cleavage pattern alignment (dark gray) and the nucleotide sequence alignment (light gray).
Figure 4.
Figure 4.
Location of CORCS1 sites relative to experimental annotations. (A) UCSC Genome Browser shot of CORCS1 in ENCODE region ENm002. Data types are indicated by labels above each track. For the NHGRI DHSs, the top, middle, and bottom tracks correspond to GM06990 (DNase-Chip method), CD4+ T cells (DNase-chip method), and CD4+ T cells (MPSS method), respectively. The latter data set was the training set for discovering CORCS1. For the UW/Regulome DHSs, the upper and lower tracks contain data from the GM06990 cell line and the SKNSH cell line, respectively. (Below) A segment of the browser shot is expanded to highlight a few examples. The oval on the right indicates a CORCS1 site that aligns with a DHS that was discovered by three different methods in three different cell lines. The oval on the left indicates a CORCS1 site that aligns with a DHS that is not in the training set. (B) Clustering of CORCS1 near experimental annotations. The distance (in base pairs) of each of the 588 CORCS1 sites to the nearest experimental annotation was measured. The three histograms show that CORCS1 clusters near annotated DHSs, TSSs, and CpG islands.
Figure 5.
Figure 5.
Enrichment of CORCS1 in DHSs, CpG islands, and TSSs. Numbers above or below ovals represent fold enrichment; the corresponding Z-score is appended in parentheses.

Similar articles

Cited by

References

    1. Balasubramanian B., Pogozelski W.K., Tullius T.D., Pogozelski W.K., Tullius T.D., Tullius T.D. DNA strand breaking by the hydroxyl radical is governed by the accessible surface areas of the hydrogen atoms of the DNA backbone. Proc. Natl. Acad. Sci. 1998;95:9738–9743. - PMC - PubMed
    1. Baylin S.B., Herman J.G., Graff J.R., Vertino P.M., Issa J.P., Herman J.G., Graff J.R., Vertino P.M., Issa J.P., Graff J.R., Vertino P.M., Issa J.P., Vertino P.M., Issa J.P., Issa J.P. Alterations in DNA methylation: A fundamental aspect of neoplasia. Adv. Cancer Res. 1998;72:141–196. - PubMed
    1. Bird A. DNA methylation patterns and epigenetic memory. Genes & Dev. 2002;16:6–21. - PubMed
    1. Crawford G., Holt I., Mullikin J., Tai D., Blakesley R., Bouffard G., Young A., Masiello C., Green E., Wolfsberg T., Holt I., Mullikin J., Tai D., Blakesley R., Bouffard G., Young A., Masiello C., Green E., Wolfsberg T., Mullikin J., Tai D., Blakesley R., Bouffard G., Young A., Masiello C., Green E., Wolfsberg T., Tai D., Blakesley R., Bouffard G., Young A., Masiello C., Green E., Wolfsberg T., Blakesley R., Bouffard G., Young A., Masiello C., Green E., Wolfsberg T., Bouffard G., Young A., Masiello C., Green E., Wolfsberg T., Young A., Masiello C., Green E., Wolfsberg T., Masiello C., Green E., Wolfsberg T., Green E., Wolfsberg T., Wolfsberg T., et al. Identifying 174 gene regulatory elements by genome-wide recovery of DNase hypersensitive sites. Proc. Natl. Acad. Sci. 2004;101:992–997. - PMC - PubMed
    1. Crawford G.E., Davis S., Scacheri P.C., Renaud G., Halawi M., Erdos M.R., Green R., Meltzer P.S., Wolfsberg T.G., Collins F.S., Davis S., Scacheri P.C., Renaud G., Halawi M., Erdos M.R., Green R., Meltzer P.S., Wolfsberg T.G., Collins F.S., Scacheri P.C., Renaud G., Halawi M., Erdos M.R., Green R., Meltzer P.S., Wolfsberg T.G., Collins F.S., Renaud G., Halawi M., Erdos M.R., Green R., Meltzer P.S., Wolfsberg T.G., Collins F.S., Halawi M., Erdos M.R., Green R., Meltzer P.S., Wolfsberg T.G., Collins F.S., Erdos M.R., Green R., Meltzer P.S., Wolfsberg T.G., Collins F.S., Green R., Meltzer P.S., Wolfsberg T.G., Collins F.S., Meltzer P.S., Wolfsberg T.G., Collins F.S., Wolfsberg T.G., Collins F.S., Collins F.S. DNase-chip: A high resolution method to identify DNaseI hypersensitive sites using tiled microarrays. Nat. Methods. 2006a;3:503–509. - PMC - PubMed

Publication types