Detection of DNA structural motifs in functional genomic elements

Jason A Greenbaum¹, Stephen C J Parker, Thomas D Tullius

Affiliations

PMID: 17568009
PMCID: PMC1891352
DOI: 10.1101/gr.5602807

Detection of DNA structural motifs in functional genomic elements

Jason A Greenbaum et al. Genome Res. 2007 Jun.

. 2007 Jun;17(6):940-6.

doi: 10.1101/gr.5602807.

Authors

Jason A Greenbaum¹, Stephen C J Parker, Thomas D Tullius

Affiliation

¹ Program in Bioinformatics, Boston University, Boston, Massachusetts 02215, USA.

PMID: 17568009
PMCID: PMC1891352
DOI: 10.1101/gr.5602807

Abstract

The completion of the human genome project has fueled the search for regulatory elements by a variety of different approaches. Many successful analyses have focused on examining primary DNA sequence and/or chromatin structure. However, it has been difficult to detect common sequence motifs within the feature of chromatin structure most closely associated with regulatory elements, DNase I hypersensitive sites (DHSs). Considering just the nucleotide sequence and/or the chromatin structure of regulatory elements may neglect a critical feature of what is recognized by the regulatory machinery--DNA structure. We introduce a new computational method to detect common DNA structural motifs in a large collection of DHSs that are found in the ENCODE regions of the human genome. We show that DHSs have common DNA structural motifs that show no apparent sequence consensus. One such structural motif is much more highly enriched in experimentally identified DHSs that are in CpG islands and near transcription start sites (TSSs), compared to DHSs not in CpG islands and farther from TSSs, suggesting that DNA structural motifs may participate in the formation of functional regulatory elements. We propose that studies of the conservation of DNA structure, independent of sequence conservation, will provide new information about the link between the nucleotide sequence of a DNA molecule and its experimentally demonstrated function.

PubMed Disclaimer

Figures

**Figure 1.**
Histogram of alignment scores of shuffled versus DHS sequences. The CORCSScrU program was run 3204 times until convergence, using either real or shuffled sequences from the MPSS DHS data set. The window size was preset to 12. The resulting alignment scores were binned and are represented here as two histograms. The alignment scores for the real sequences are generally higher than those from the shuffled sequences. A Kolmogorov-Smirnov test indicates that these two distributions are significantly different (p = 10⁻¹⁷).

**Figure 2.**
Analysis of representative high-scoring CORCS. (*A–C*) Heat maps of CORCS found in (A) the MPSS data set and (B) the Union data set using the discrete sampler, and in (C) the Union data set using the continuous sampler. The X-axis represents each position in the CORCS and the Y-axis represents cleavage value bins. Dark blue cells in the heat map indicate no cleavage values for bin Y at position X are present, whereas red cells indicate a large proportion of the cleavage values for that column. (*D–F*) Mean predicted hydroxyl radical cleavage patterns of CORCS found in (D) the MPSS data set and (E) the Union data set using the discrete sampler, and in (F) the Union data set using the continuous sampler. (*G–I*) Sequence logos of CORCS found in (G) the MPSS data set and in (H) the Union data set using the discrete sampler, and in (I) the Union data set using the continuous sampler.

**Figure 3.**
Conservation of nucleotide sequence versus structure in CORCS2. Plotted here is the normalized information present at each nucleotide position in CORCS2 for the hydroxyl radical cleavage pattern alignment (dark gray) and the nucleotide sequence alignment (light gray).

**Figure 4.**
Location of CORCS1 sites relative to experimental annotations. (A) UCSC Genome Browser shot of CORCS1 in ENCODE region ENm002. Data types are indicated by labels *above* each track. For the NHGRI DHSs, the *top*, *middle*, and *bottom* tracks correspond to GM06990 (DNase-Chip method), CD4+ T cells (DNase-chip method), and CD4+ T cells (MPSS method), respectively. The latter data set was the training set for discovering CORCS1. For the UW/Regulome DHSs, the *upper* and *lower* tracks contain data from the GM06990 cell line and the SKNSH cell line, respectively. (*Below*) A segment of the browser shot is expanded to highlight a few examples. The oval on the *right* indicates a CORCS1 site that aligns with a DHS that was discovered by three different methods in three different cell lines. The oval on the *left* indicates a CORCS1 site that aligns with a DHS that is not in the training set. (B) Clustering of CORCS1 near experimental annotations. The distance (in base pairs) of each of the 588 CORCS1 sites to the nearest experimental annotation was measured. The three histograms show that CORCS1 clusters near annotated DHSs, TSSs, and CpG islands.

**Figure 5.**
Enrichment of CORCS1 in DHSs, CpG islands, and TSSs. Numbers *above* or *below* ovals represent fold enrichment; the corresponding Z-score is appended in parentheses.

See this image and copyright information in PMC

References

1. Balasubramanian B., Pogozelski W.K., Tullius T.D., Pogozelski W.K., Tullius T.D., Tullius T.D. DNA strand breaking by the hydroxyl radical is governed by the accessible surface areas of the hydrogen atoms of the DNA backbone. Proc. Natl. Acad. Sci. 1998;95:9738–9743. - PMC - PubMed
1. Baylin S.B., Herman J.G., Graff J.R., Vertino P.M., Issa J.P., Herman J.G., Graff J.R., Vertino P.M., Issa J.P., Graff J.R., Vertino P.M., Issa J.P., Vertino P.M., Issa J.P., Issa J.P. Alterations in DNA methylation: A fundamental aspect of neoplasia. Adv. Cancer Res. 1998;72:141–196. - PubMed
1. Bird A. DNA methylation patterns and epigenetic memory. Genes & Dev. 2002;16:6–21. - PubMed
1. Crawford G., Holt I., Mullikin J., Tai D., Blakesley R., Bouffard G., Young A., Masiello C., Green E., Wolfsberg T., Holt I., Mullikin J., Tai D., Blakesley R., Bouffard G., Young A., Masiello C., Green E., Wolfsberg T., Mullikin J., Tai D., Blakesley R., Bouffard G., Young A., Masiello C., Green E., Wolfsberg T., Tai D., Blakesley R., Bouffard G., Young A., Masiello C., Green E., Wolfsberg T., Blakesley R., Bouffard G., Young A., Masiello C., Green E., Wolfsberg T., Bouffard G., Young A., Masiello C., Green E., Wolfsberg T., Young A., Masiello C., Green E., Wolfsberg T., Masiello C., Green E., Wolfsberg T., Green E., Wolfsberg T., Wolfsberg T., et al. Identifying 174 gene regulatory elements by genome-wide recovery of DNase hypersensitive sites. Proc. Natl. Acad. Sci. 2004;101:992–997. - PMC - PubMed
1. Crawford G.E., Davis S., Scacheri P.C., Renaud G., Halawi M., Erdos M.R., Green R., Meltzer P.S., Wolfsberg T.G., Collins F.S., Davis S., Scacheri P.C., Renaud G., Halawi M., Erdos M.R., Green R., Meltzer P.S., Wolfsberg T.G., Collins F.S., Scacheri P.C., Renaud G., Halawi M., Erdos M.R., Green R., Meltzer P.S., Wolfsberg T.G., Collins F.S., Renaud G., Halawi M., Erdos M.R., Green R., Meltzer P.S., Wolfsberg T.G., Collins F.S., Halawi M., Erdos M.R., Green R., Meltzer P.S., Wolfsberg T.G., Collins F.S., Erdos M.R., Green R., Meltzer P.S., Wolfsberg T.G., Collins F.S., Green R., Meltzer P.S., Wolfsberg T.G., Collins F.S., Meltzer P.S., Wolfsberg T.G., Collins F.S., Wolfsberg T.G., Collins F.S., Collins F.S. DNase-chip: A high resolution method to identify DNaseI hypersensitive sites using tiled microarrays. Nat. Methods. 2006a;3:503–509. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

R01 HG003541/HG/NHGRI NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- H1 Connect - Access expert opinions and insights on biomedical research.
Research Materials
- Coriell Cell Repositories

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Detection of DNA structural motifs in functional genomic elements

Affiliation

Detection of DNA structural motifs in functional genomic elements

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials