Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Dec 24;330(6012):1787-97.
doi: 10.1126/science.1198374. Epub 2010 Dec 22.

Identification of functional elements and regulatory circuits by Drosophila modENCODE

modENCODE Consortium  1 Sushmita RoyJason ErnstPeter V KharchenkoPouya KheradpourNicolas NegreMatthew L EatonJane M LandolinChristopher A BristowLijia MaMichael F LinStefan WashietlBradley I ArshinoffFerhat AyPatrick E MeyerNicolas RobineNicole L WashingtonLuisa Di StefanoEugene BerezikovChristopher D BrownRogerio CandeiasJoseph W CarlsonAdrian CarrIrwin JungreisDaniel MarbachRachel SealfonMichael Y TolstorukovSebastian WillArtyom A AlekseyenkoCarlo ArtieriBenjamin W BoothAngela N BrooksQi DaiCarrie A DavisMichael O DuffXin FengAndrey A GorchakovTingting GuJorja G HenikoffPhilipp KapranovRenhua LiHeather K MacAlpineJohn MaloneAki MinodaJared NordmanKatsutomo OkamuraMarc PerrySara K PowellNicole C RiddleAkiko SakaiAnastasia SamsonovaJeremy E SandlerYuri B SchwartzNoa SherRebecca SpokonyDavid SturgillMarijke van BarenKenneth H WanLi YangCharles YuElise FeingoldPeter GoodMark GuyerRebecca LowdonKami AhmadJusten AndrewsBonnie BergerSteven E BrennerMichael R BrentLucy CherbasSarah C R ElginThomas R GingerasRobert GrossmanRoger A HoskinsThomas C KaufmanWilliam KentMitzi I KurodaTerry Orr-WeaverNorbert PerrimonVincenzo PirrottaJames W PosakonyBing RenSteven RussellPeter CherbasBrenton R GraveleySuzanna LewisGos MicklemBrian OliverPeter J ParkSusan E CelnikerSteven HenikoffGary H KarpenEric C LaiDavid M MacAlpineLincoln D SteinKevin P WhiteManolis Kellis
Collaborators, Affiliations

Identification of functional elements and regulatory circuits by Drosophila modENCODE

modENCODE Consortium et al. Science. .

Abstract

To gain insight into how genomic information is translated into cellular and developmental programs, the Drosophila model organism Encyclopedia of DNA Elements (modENCODE) project is comprehensively mapping transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines. We have generated more than 700 data sets and discovered protein-coding, noncoding, RNA regulatory, replication, and chromatin elements, more than tripling the annotated portion of the Drosophila genome. Correlated activity patterns of these elements reveal a functional regulatory network, which predicts putative new functions for genes, reveals stage- and tissue-specific regulators, and enables gene-expression prediction. Our results provide a foundation for directed experimental and computational studies in Drosophila and related species and also a model for systematic data integration toward comprehensive genomic and functional annotation.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Overview of Drosophila modENCODE data sets. Range of genomic elements and trans factors studied, with relevant techniques and resulting genome annotations. hnRNA, heterogeneous nuclear RNA.
Fig. 2
Fig. 2
Coding and noncoding genes and structures. (A) Extended region of male-specific expression in chromosome 2R including new protein-coding and noncoding transcripts. MIP03715 contains two short ORFs of 23 and 21 codons, respectively. ORF multispecies alignments (color coded) show abundant synonymous (bright green) and conservative (dark green) substitutions and a depletion of nonsynonymous substitutions (red), indicative of protein-coding selection [ratio of nonsynonymous to synonymous substitutions (dN/dS) < 1 for both, P < 10−7 and P < 10−11, respectively, likelihood ratio test]. Surrounding regions show abundant stop codons (blue, magenta, yellow) and frame-shifted positions (orange). (B) A transcribed region in chromosome 3R (26,572,290 to 26,573,456), identified by RNA-seq and supported by promoter-specific and transcription-associated chromatin marks, shows RNA secondary-structure conservation in eight Drosophila species. (C) Example of a new miRNA derived from a protein-coding exon of CG6700, with 21- to 23-nt RNAs indicative of Drosha/Dicer-1 processing and also recovered in AGO1-immunoprecipitate libraries from S2 cells and adult heads indicative of Argonaute loading. Evolutionary evidence suggests protein-coding constraint, no conservation for the mature arm, and conservation of the star arm. Red boxes indicate 8-mer “seed” sequence potentially mediating 3′ UTR targeting.
Fig. 3
Fig. 3
Chromatin-based annotation of functional elements. (A) Average enrichment profiles of histone marks, chromosomal proteins, and physical chromatin properties at genes, origins of replications, insulator proteins, and TF binding positions. Each panel shows 4 kb centered at a specified location, either proximal to TSS (prox.) or distal (dist.). (B) Example of a transcript predicted by chromatin signatures associated with promoter (red trace) and gene bodies (blue box) and supported by cDNA evidence. Strong RNA Pol II and H3K4me3 peaks in the promoter region and strong H2B ubiquitination extending toward the previously annotated luna gene are confirmed by RNA-seq junction reads that were not used in the prediction. (C) Intergenic H3K36me1 chromatin signatures predict replication activity. Enrichment of multiple chromatin marks were used to identify putative large (>10 kbp) intergenic H3K36me1/H3K18ac domains located outside of annotated genes. Although these marks generally correspond to long introns within transcripts, their intergenic domains were enriched for replication activity (fig. S5). In this example from BG3 cells, such a domain was found upstream of the bi locus and is associated with early replication, contains an early origin, is enriched for ORC binding, and is further supported by NippedB binding.
Fig. 4
Fig. 4
Discovery and characterization of chromatin states and their functional enrichments. Combinatorial patterns of chromatin marks in S2 and BG3 cells reveal chromatin states associated with different classes of functional elements. A discrete model (states d1 to d30) captures the presence/absence information, and a continuous model (states c1 to c9) also incorporates mark intensity information (22). States were learned solely from mapped locations of marks (left) and were associated with modENCODE-defined elements (right) with most pronounced patterns in euchromatin (green) and heterochromatin (blue) shown here (additional variations shown in fig. S6).
Fig. 5
Fig. 5
High-occupancy TF binding regions and their relation to motifs, ORC, and chromatin. (A) Enrichment of known motifs for regions bound by corresponding TF, sorted by average complexity, denoting the number of distinct TFs bound in the same region. For eight TFs, motifs are depleted (blue) for higher-complexity regions, suggesting non–sequence-specific recruitment. In seven of eight cases, known motifs were enriched in bound regions (Enrich), suggesting sequence-specific recruitment in lower-complexity regions. For each factor, binding sites were highly reproducible between replicates (Reprod). (B) ORC versus TF complexity. The relation between HOT spot complexity (x axis) and enrichment in ORC binding (y axis). (C) Discovered motifs in high- or low-complexity regions (boxed range) and their enrichment in regions of higher (red) or lower (blue) complexity. M1 to M5 are candidate “drivers” of HOT region establishment.
Fig. 6
Fig. 6
Genome coverage by modENCODE data sets. (A) Unique (bars) and cumulative (lines) coverage of nonrepetitive (blue line) and conserved (red line) genomes. (B) Multiple coverage for data sets grouped into transcribed elements (red), bound regulators (blue), and chromatin domains (green) (17). Across all three classes (black), 10.8% of the genome is covered 15 or more times, and 69.5% is covered at least twice. (C) Increased coverage in a Chr2R region with no prior annotation (left half), now showing multiple overlapping data sets. Coverage by different tracks is highly clustered (fig. S11), with some regions showing little coverage and others densely covered by many types of data.
Fig. 7
Fig. 7
Properties of the physical regulatory network. (A) Hierarchical view of mixed ChIP-based/miRNA physical regulatory network that combines transcriptional regulation by 76 TFs (green) from ChIP experiments and posttranscriptional regulation by 52 miRNAs (red). TFs are organized in a five-level hierarchy on the basis of their relative proportion of TF targets versus TF regulators. miRNAs are separated into two groups: the ones that are regulated by TFs (left) and the ones that only regulate TFs (right). The horizontal position of the TFs in each level shows whether they regulate miRNAs (left), have no regulation to or from miRNAs (middle), or do not regulate but are targeted by miRNAs (right). Different shades of green and red represent the total number of target genes for TFs and miRNAs, respectively (darker nodes indicate more targets). Ninety-two percent of TF regulatory connections are downstream connections from higher levels to lower levels (green), and only 8% are upstream (blue). miRNA regulatory connections are red. (B) Highly enriched network motifs in a mixed physical regulatory network including TFs (green), miRNAs (red), and target genes (black). For each motif, five examples are shown. Known activators, blue; known repressors, red; other TFs, black.
Fig. 8
Fig. 8
Gene function prediction from coexpression and co-regulation patterns. Receiver operator characteristic curves for GO terms with predicted new members and area-under-the-curve statistics. False negatives for each GO term are predictions for genes previously annotated for “incompatible” GO terms, defined as pairs of GO terms that have less than 10% common genes relative to the union of their gene sets.
Fig. 9
Fig. 9
Predictive models of regulator, region, and gene activity. (A) Dynamic regulatory map produced by DREM predicts stage-specific regulators associated with expression changes (y axis, log space relative to first time point) across developmental stages (x axis) (17). Each path (colored lines) indicates the average expression of a group of genes (solid circles) and its standard deviation (size of circle). Predicted bifurcation events, or splits, (open circles) are numbered 1 through 19. The colored insets show the expression level of each individual gene going through the split and ranked regulators from the physical (black) or functional (blue) regulatory network associated with the higher (H), lower (L), or middle (M) path. The uncolored inset shows the expression of repressor SU(HW), whose expression decrease coincides with an expression increase of its targets (red asterisk). (B) Predicted S2 activators (top group) or repressors (bottom group), based on the coherence between relative expression of the TF in S2 (yellow) versus BG3 (green) and the relative motif enrichment (red) or depletion (blue) in S2 versus BG3 for activating (left columns) or repressive marks (right columns). (C) True (top of shaded area) and predicted (dotted blue line) expression levels for target genes, from the expression levels of inferred activators (red) and repressors (green). Only the top five positive and negative regulators are shown, ranked by their contribution to the expression prediction (weight of linear-regression model). Examples are shown from 8 of 1487 predictable genes, ranked by prediction quality scores (rank in upper right corner), evaluated as the averaged squared error between predicted and true expression levels across the time course. An expanded set of examples is shown in fig S23.

Comment in

References

    1. www.genome.gov/10005107.
    1. Celniker SE, et al. Nature. 2009;459:927. - PMC - PubMed
    1. Hoskins RA, et al. Science. 2007;316:1625. - PMC - PubMed
    1. Compared to FlyBase release 5.12 (October 2008), available at http://fb2008_09.flybase.org/

    1. Stapleton M, et al. Genome Biol. 2002;3 RESEARCH0080. - PubMed

Publication types

MeSH terms