Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Mar;20(3):381-92.
doi: 10.1101/gr.098657.109. Epub 2010 Jan 14.

Genome-wide discovery of human heart enhancers

Affiliations

Genome-wide discovery of human heart enhancers

Leelavati Narlikar et al. Genome Res. 2010 Mar.

Abstract

The various organogenic programs deployed during embryonic development rely on the precise expression of a multitude of genes in time and space. Identifying the cis-regulatory elements responsible for this tightly orchestrated regulation of gene expression is an essential step in understanding the genetic pathways involved in development. We describe a strategy to systematically identify tissue-specific cis-regulatory elements that share combinations of sequence motifs. Using heart development as an experimental framework, we employed a combination of Gibbs sampling and linear regression to build a classifier that identifies heart enhancers based on the presence and/or absence of various sequence features, including known and putative transcription factor (TF) binding specificities. In distinguishing heart enhancers from a large pool of random noncoding sequences, the performance of our classifier is vastly superior to four commonly used methods, with an accuracy reaching 92% in cross-validation. Furthermore, most of the binding specificities learned by our method resemble the specificities of TFs widely recognized as key players in heart development and differentiation, such as SRF, MEF2, ETS1, SMAD, and GATA. Using our classifier as a predictor, a genome-wide scan identified over 40,000 novel human heart enhancers. Although the classifier used no gene expression information, these novel enhancers are strongly associated with genes expressed in the heart. Finally, in vivo tests of our predictions in mouse and zebrafish achieved a validation rate of 62%, significantly higher than what is expected by chance. These results support the existence of underlying cis-regulatory codes dictating tissue-specific transcription in mammalian genomes and validate our enhancer classifier strategy as a method to uncover these regulatory codes.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
(A) Overview of the methodology. The yellow box shows the main classifier that takes as input two sets of sequences: enhancers and controls. The classifier is used first to select a homogenous set of enhancers and then used again to classify between the selected set and control sequences. (B) Distribution of positive sequences predicted correctly. Almost one-third of the sequences are predicted consistently (>50% of the time) as positives (red dotted line). Sequences to the right of the line were considered homogenous. (C) ROC curve for five different methods on selected homogeneous sets. Performance details of our method and of four state-of-the-art methods are shown here. The maximum area under the ROC curve is achieved by our method (0.92) (shaded in gray).
Figure 2.
Figure 2.
(A) Feature weights. (Green) Positive weights learned by the classifier; (brown) negative weights. Motif features of the same TF are clubbed together. The names of the features are listed near the baseline of the graph. (*) Features known previously to be implicated in heart activity or heart development; (?) de novo motifs A–E. (B) The five de novo motifs with positive weights. STAMP (Mahony and Benos 2007) was used to predict de novo motif associations with binding specificities of TF families from TRANSFAC and JASPAR. The top match with its P-value is shown. The last column indicates the fraction of sequences in the enhancer and the control set containing a match to each de novo motif.
Figure 3.
Figure 3.
(A) Distribution of heart scores of CNEs. Scores assigned by the classifier for all tested CNEs are shown here. We use zero as a cutoff (Methods) for putative enhancers (dotted line). (Red) Scores of the training enhancer set. (B) Mean fraction of high-scoring CNEs in loci of genes highly expressed in each tissue. Tissues are sorted based on the mean fraction of putative heart enhancers in their loci. P-values were computed using a rank sum test; heart tissue had the most significant P-value of 1.6 × 10−9. (C) (Black peaks) Snapshots of genome-wide view of predictions near genes. The score returned by the classifier is transformed to lie between 0 and 1, with numbers >0.5 indicating the occurrence of a putative heart enhancer. The color and shade of the gene transcript depict the type and level of gene expression, respectively: (red) genes highly expressed in the heart; (green) repressed genes. Genes highly expressed in the heart have typically more enhancers in their loci (top three genomic regions), while genes repressed or not expressed in the heart have fewer predictions in their loci. (All elements in the training set are excluded in these figures.)
Figure 4.
Figure 4.
Experimental validation of predicted heart enhancers. (A) Four predicted heart enhancers driving expression of the reporter genes GFP in transgenic zebrafish (first column) and lacZ in transgenic mouse embryos (second column). (Red arrows) Expression of the reporter genes in the heart. (Third column) Reporter expression in dissected hearts for each of the constructs shown. Coordinates are hg18. (B) Positive and negative predicted heart enhancers identified in zebrafish transgenics. (Top image) Transgenic zebrafish displaying GFP expression in the heart driven by a predicted heart enhancer (positive). (Bottom image) Another predicted heart enhancer that did not drive reporter gene expression in the heart (negative).

Similar articles

Cited by

References

    1. Aitola M, Carlsson P, Mahlapuu M, Enerback S, Pelto-Huikko M. Forkhead transcription factor FoxF2 is expressed in mesodermal tissues involved in epithelio-mesenchymal interactions. Dev Dyn. 2000;218:136–149. - PubMed
    1. Alkema WB, Johansson O, Lagergren J, Wasserman WW. MSCAN: Identification of functional clusters of transcription factor binding sites. Nucleic Acids Res. 2004;32:W195–W198. - PMC - PubMed
    1. Andres V, Cervera M, Mahdavi V. Determination of the consensus binding site for MEF2 expressed in muscle and brain reveals tissue-specific sequence constraints. J Biol Chem. 1995;270:23246–23249. - PubMed
    1. Bailey TL, Gribskov M. Combining evidence using p-values: Application to sequence homology searches. Bioinformatics. 1998;14:48–54. - PubMed
    1. Bi W, Drake CJ, Schwarz JJ. The transcription factor MEF2C-null mouse exhibits complex vascular malformations and reduced cardiac expression of angiopoietin 1 and VEGF. Dev Biol. 1999;211:255–267. - PubMed

Publication types

LinkOut - more resources