Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jan;241(1):169-89.
doi: 10.1002/dvdy.22728. Epub 2011 Aug 30.

Use of a Drosophila genome-wide conserved sequence database to identify functionally related cis-regulatory enhancers

Affiliations
Free PMC article

Use of a Drosophila genome-wide conserved sequence database to identify functionally related cis-regulatory enhancers

Thomas Brody et al. Dev Dyn. 2012 Jan.
Free PMC article

Abstract

Background: Phylogenetic footprinting has revealed that cis-regulatory enhancers consist of conserved DNA sequence clusters (CSCs). Currently, there is no systematic approach for enhancer discovery and analysis that takes full-advantage of the sequence information within enhancer CSCs.

Results: We have generated a Drosophila genome-wide database of conserved DNA consisting of >100,000 CSCs derived from EvoPrints spanning over 90% of the genome. cis-Decoder database search and alignment algorithms enable the discovery of functionally related enhancers. The program first identifies conserved repeat elements within an input enhancer and then searches the database for CSCs that score highly against the input CSC. Scoring is based on shared repeats as well as uniquely shared matches, and includes measures of the balance of shared elements, a diagnostic that has proven to be useful in predicting cis-regulatory function. To demonstrate the utility of these tools, a temporally-restricted CNS neuroblast enhancer was used to identify other functionally related enhancers and analyze their structural organization.

Conclusions: cis-Decoder reveals that co-regulating enhancers consist of combinations of overlapping shared sequence elements, providing insights into the mode of integration of multiple regulating transcription factors. The database and accompanying algorithms should prove useful in the discovery and analysis of enhancers involved in any developmental process.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
The Drosophila genome can be parsed into clusters of conserved sequence blocks that are flanked by less conserved DNA. A: Shown is a UCSC Genome Browser conservation histogram of a D. melanogaster chromosome 3L region that spans 66 kb of the vvl transcribed sequence and 3′ flanking DNA. Highly conserved DNA sequences that align with the orthologous regions of other Drosophilids are indicated as peaks in the histogram. The rectangle identified as “Your Seq” corresponds to the EvoPrinted region shown in B and the 4 vertical red-colored arrows correspond to the CSC parsing boundaries shown in B. B: A D. melanogaster (ref. sequence) relaxed 12 species EvoPrint of the Your Seq region in A (6,355 bp) identifies three conserved sequence clusters designated vvl-41, -42, and -43. Capital letters represent conserved bases in the D. melanogaster sequence that are present in all, or in all but one, of the orthologous regions within 11 additional species. Intra-cluster cis-Decoder CSB alignments reveal that over 60% of the conserved sequences within each CSC spans ≥ 6-bp repeat sequence elements (yellow highlight) that are either separate, adjacent, and/or overlapping each other. High copy number RPS elements within each CSC are noted with different colored highlights. Red-colored arrows indicate parsing boundaries for the cis-Decoder CSC database.
Fig. 2
Fig. 2
The cas-6 CSC functions as an NB enhancer that regulates gene expression during late embryonic CNS sub-lineage development. A: cas-6 CSC enhancer-reporter transgene activates expression in a subset of NBs during late sub-lineage development. Shown are dissected fillets of whole-mount stained embryos, stages 10 through 13 (s10–s13; anterior up). B: An EvoPrint of the cas-6 enhancer (same EvoPrint conditions as in Fig. 1B). CSB sequences that span repeat elements are highlighted in yellow (identified from cis-Decoder CSB alignments, see C). Colored underlined bases correspond to the core transcription factor DNA-binding sites (homeodomain, ATTA-red; POU domain, ATGCAAAT-green; bHLH, CANNTG-brown: Hunchback/Castor, TTTTT/AT-blue; Tramtrack, TCCT-gold; and PBX sites, TGAT-teal). C: cis-Decoder self-alignment of the cas-6 enhancer CSC identified 50 distinct repeat or palindromic elements. The total element count in the table refers to the number of times a repeat appears in the CSC database. Colored asterisks indicate repeats that contain core known transcription factor DNA-binding motifs highlighted in B. The green-colored underlined repeat indicates the sequence (ATGCAAA) that was used to identify other late sub-lineage NB enhancers that share sequence elements with cas-6 (see Figs. 3–5).
Fig. 3
Fig. 3
Late sub-lineage NB enhancers share conserved repeat elements that are balanced in their frequency of occurrence. A: A one-on-one CSB cis-Decoder alignment of three consecutive cg7229-5 CSBs (nos. 2–4) with cas-6 CSB sequences. Color-coded bases: Green, the required cas-6 repeat element used to identify other CSCs in the database search; Blue, sequences are present just once in the cas-6 enhancer; Red, cas-6 repeats; Orange, shorter (≥ 6 bp) repeat sequences that are part of larger cas-6 repeats. The cas-6 CSB number and alignment orientation (forward or reverse) is indicated following each aligning sequence. B: cas-6 and cg7229-5 share conserved elements that are unique (blue), repeat (red), or sub-repeat (gold) elements within the cas-6 CSC (green underlined sequence indicates the mandatory element used to initiate the CSC database search). C: A cg7229-5 CSC 12 species relaxed EvoPrint. Sequences that are present within cas-6 CSBs are highlighted in the cg7229-5 CSBs and color-coded to indicate their relative frequencies (see Fig. 4 for color code). D: cg7229-5 CSC enhancer-reporter transgene expression analysis (Gal4-reporter mRNA in situ hybridization) reveals that, similar to the cas-6 enhancer, the cg7229-5 CSC functions as a late temporal window NB enhancer (embryo preparations as in Fig. 2A). Inset: Co-expression analysis reveals partial overlap between cells expressing cas mRNA (green) and those expressing the cg7229-5 enhancer-reporter transgene mRNA (red; stage 11, dorsal whole-mount view).
Fig. 4
Fig. 4
cis-Decoder analysis reveals that cas-6 and grh-15 CSCs share many sequence elements that are balanced in their copy number. Shown are a pie chart and a repeat balance map, both of which illustrate the relative copy number balance of shared elements between cas-6 and grh-15. The repeat balance map of a relaxed grh-15 CSC EvoPrint was highlighted to show comparative frequency of elements that are shared with the cas-6 CSC. Green indicates balanced repeat element numbers between the two CSCs; yellow highlights repeats that are unbalanced by just one copy; purple, two copies; and red, three or more copies. Gray highlighted sequences are present just once in the cas-6 CSBs. When uniquely shared sequences overlap repeat sequences, the repeat-ratio highlight color indicator is shown. When repeat elements overlap one another, the balance-ratio highlight of the longer repeat is shown, and when two repeats of equal size overlap, the more balanced repeat is highlighted.
Fig. 5
Fig. 5
Identification of novel embryonic neural precursor cell enhancers based on their shared repeat sequences with other known neural enhancers. A: Like the cg7229-5 enhancer (Fig. 3), additional database CSCs (Table 1) were identified that share balanced repeat sequences with the cas-6 enhancer, and they also function as late NB sub-lineage enhancers. Many identified CSCs are adjacent to known NB expressed genes (vvl, nab, cas, tkr, and grh). B: Additional late sub-lineage neural precursor cell enhancers were also identified in cis-Decoder CSC database searches using CSBs from different NB enhancer CSCs as input (vvl-41 and sqz-11, identified via the cas-8 CSBs; ct-3, using the pdm-2 gene NB enhancer CSBs; Berman et al., 2004); and ct-14, using the cg6559-28 CSBs (Fig. 5A). Shown are dissected fillets of whole-mount-stained embryos, stages 10–12 (left to right, respectively, anterior up).
Fig. 6
Fig. 6
Enhancers that share unbalanced repeat elements between their CSCs carry out distinct regulatory functions. A: cis-Decoder alignments between the cas-6 enhancer and vvl-43 CSBs identified 93 different unique (blue), repeat (red), and shorter truncated-repeat (orange) sequence elements that were common to each CSC (green underline indicates the cas-6 repeat that was used to initiate the cis-Decoder CSC database search). B: The vvl-43 CSC relaxed EvoPrint was highlighted to show repeat element frequencies relative to the cas-6 enhancer (see color coding in Fig. 4). C: vvl-43 CSC enhancer-reporter transgene expression analysis (Gal4-reporter mRNA in situ hybridization) reveals that, unlike the cas-6 enhancer (Fig. 2A), vvl-43 activates reporter expression in a subset of ectodermal cells during stage 11 and no reporter expression was detected in CNS NBs. Shown are filleted-flattened preparations of whole-mount-stained embryos, embryonic stages 11–14 (anterior up).
Fig. 7
Fig. 7
vvl-41 and vvl-43 enhancers exhibit an imbalance in copy number of their shared elements as evidenced by the low level of perfectly matched sequences. A: Shown are a pie chart and a vvl-43 CSC repeat balance map that illustrate the relative copy number balance of shared elements between vvl-41 and vvl-43 (see Fig. 4 for ratio map color code). B: vvl-41 and vvl-43 CSCs function as larval neural enhancers that drive the expression of a membrane-bound GFP-CD8 reporter in different sets of CNS neurons. Shown are dissected cephalic lobes and ventral cords from wandering third-instar larva (dorsal views, anterior up).
Fig. 8
Fig. 8
cis-Decoder CSC database searches identify shared conserved sequence elements among cellular blastoderm gap enhancers. A: CSB alignments between the Krüppel and giant gap enhancers (Kr_CD1, Hoch et al., ; gt_10, Schroeder et al., 2004) identify 42 distinct conserved sequence elements of 6 bp or greater that represent 55.62% of the conserved bases within the gt_10 enhancer. The red-colored boxed 14-bp sequence corresponds to the characterized overlapping Knirps and Bicoid transcription factor bindings sites that are required for the wild-type Kr_CD1 enhancer regulatory behavior (Hoch et al., 1992). B: The knirps gap enhancer CSC (kni_(+1); Schroeder et al., 2004) relaxed EvoPrint was highlighted to show shared RPS and unique element frequencies present in the Kr_CD1 gap enhancer (see color coding in Fig. 4 for RPS balance index).
Fig. 9
Fig. 9
Many NB enhancers that regulate gene expression during embryonic CNS development also activate gene expression during adult development and in the adult nervous system. A: During third-instar larval development, the cg6559-28, grh-15, and tkr-15 CSC enhancer-Gal4 driver transgenes activate membrane-bound GFP-CD8 tagged transgene expression in sub-regions of the cephalic lobes and in thoracic ventral cord neural precursor cells. Shown are dorsal views of dissected CNS preparations from wandering third-instar larva (anterior up). B: In the adult brain, the cg6559-28, vvl-14, and nab-1 enhancers drive GFP-CD8 reporter expression in neurons whose cell bodies reside in the mushroom body calyxes and in different regions of the sub-esophageal ganglion. Shown are confocal, optical sections of GFP immunostained adult brains (frontal views) at the level of the mushroom bodies and the sub-esophageal ganglion.

References

    1. Alonso ME, Pernaute B, Crespo M, Gómez-Skarmeta JL, Manzanares M. Understanding the regulatory genome. Int J Dev Biol. 2009;53:1367–1378. - PubMed
    1. Anderson MG, Perkins GL, Chittick P, Shrigley RJ, Johnson WA. drifter, a Drosophila POU-domain transcription factor, is required for correct differentiation and migration of tracheal cells and midline glia. Genes Dev. 1995;9:123–137. - PubMed
    1. Armstrong JD, Kaiser K. The study of Drosophila brain development. In: Houdebine LM, editor. Transgenic animals: generation and use. Reading, UK: Harwood Academic Publishers; 1997. pp. 365–370.
    1. Babayeva ND, Wilder PJ, Shiina M, Mino K, Desler M, Ogata K, Rizzino A, Tahirov TH. Structural basis of Ets1 cooperative binding to palindromic sequences on stromelysin-1 promoter DNA. Cell Cycle. 2010;9:3054–3062. - PMC - PubMed
    1. Bailey AM, Posakony JW. Suppressor of Hairless directly activates transcription of Enhancer of split complex genes in response to Notch receptor activity. Genes Dev. 1995;9:2609–2622. - PubMed

Publication types

LinkOut - more resources