Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 May 30:5:e3389.
doi: 10.7717/peerj.3389. eCollection 2017.

Modeling the cis-regulatory modules of genes expressed in developmental stages of Drosophila melanogaster

Affiliations

Modeling the cis-regulatory modules of genes expressed in developmental stages of Drosophila melanogaster

Yosvany López et al. PeerJ. .

Abstract

Because transcription is the first step in the regulation of gene expression, understanding how transcription factors bind to their DNA binding motifs has become absolutely necessary. It has been shown that the promoters of genes with similar expression profiles share common structural patterns. This paper presents an extensive study of the regulatory regions of genes expressed in 24 developmental stages of Drosophila melanogaster. It proposes the use of a combination of structural features, such as positioning of individual motifs relative to the transcription start site, orientation, pairwise distance between motifs, and presence of motifs anywhere in the promoter for predicting gene expression from structural features of promoter sequences. RNA-sequencing data was utilized to create and validate the 24 models. When genes with high-scoring promoters were compared to those identified by RNA-seq samples, 19 (79.2%) statistically significant models, a number that exceeds previous studies, were obtained. Each model yielded a set of highly informative features, which were used to search for genes with similar biological functions.

Keywords: Co-expression; Developmental stage; Genetic algorithm; Genome-wide analysis; Promoter architecture; Transcription factor binding sites.

PubMed Disclaimer

Conflict of interest statement

Kenta Nakai is an Academic Editor for PeerJ.

Figures

Figure 1
Figure 1. Workflow of our computational approach.
Figure 2
Figure 2. Schematic representation of the computed features in promoter regions of stage-expressed genes.
Geometrical forms above/below the horizontal line represent the orientation of sequence motifs towards plus/minus strands.
Figure 3
Figure 3. Heatmap of the expression level of transcription factor genes in specific developmental stages.
dl (dorsal), sna (snail), Hsf (Heat shock factor), nub (nubbin), Cf2 (Chorion factor 2), croc (crocodile), Pph13 (PvuII-PstI homology 13), lbe (ladybird early), al (aristaless), hkb (huckebein), Lim1 (LIM homeobox 1), Kr (Kruppel), zen (zerknullt), ftz (fushi tarazu).
Figure 4
Figure 4. Structural features of three models with the highest performance, (A) embryo 12–14 h, (B) white prepupae +24 h and (C) pupae.
Squares above/below the horizontal line indicate the DNA strand where the motif is located. Arrows represent features related to order of motifs.
Figure 5
Figure 5. Performance of the models with different types of features. (A) Boxplots of Fscores. (B) Bar plots of the number of significant models. (C) Frequency of informative features in all the models.
The M’s stand for simple to more complex models: M-1 (presence of motifs), M-2 (presence and orientation of motifs), M-3 (presence, orientation and positioning of motifs relative to the TSS), M-4 (presence, orientation, positioning of motifs relative to the TSS and pairwise distance of motifs), M-5 (presence, orientation, positioning of motifs relative to the TSS, pairwise distance and order of motifs), M-6 (all the features), and M-7 (presence, positioning of motifs relative to the TSS and pairwise distance of motifs). The “0%” means that no features related to presence of motifs regardless of orientation were obtained.
Figure 6
Figure 6. Heatmap of p-values indicating the ability of one model to characterize the promoter regions of genes expressed in the other stages.
For each stage, all the genes were scored by the model of another stage (the rows indicate the models and the columns are the score sets). The scoring of expressed and non-expressed genes was evaluated with the Student’s t-test.
Figure 7
Figure 7. Schematic representation of the promoter regions of three genes involved in (A) motor neuron axon guidance, signal peptide and developmental protein, (B) immunoglobulin-like domain/fold, immunoglobulin subtype 2 and immunoglobulin subtype/domain, and (C) neurogenesis.
Squares above/below the horizontal line indicate the DNA strand where the motif is located. For more detailed descriptions of these promoter regions, please refer to Data S2.

Similar articles

Cited by

References

    1. Andrews S. FastQC: a quality control tool for high throughput sequence data. 2010. http://www.bioinformatics.babraham.ac.uk/projects/fastqc http://www.bioinformatics.babraham.ac.uk/projects/fastqc
    1. Bagni C, Bray S, Gogos JA, Kafatos FC, Hsu T. The Drosophila zinc finger transcription factor CF2 is a myogenic marker downstream of MEF2 during muscle development. Mechanisms of Development. 2002;117:265–268. doi: 10.1016/S0925-4773(02)00176-4. - DOI - PubMed
    1. Bailey TL, Williams N, Misleh C, Li WW. MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Research. 2006;34:W369–W373. doi: 10.1093/nar/gkl198. - DOI - PMC - PubMed
    1. Bajic VB, Choudhary V, Hock CK. Content analysis of the core promoter region of human genes. In Silico Biology. 2003;4:1–15. - PubMed
    1. Beira JV, Paro R. The legacy of Drosophila imaginal discs. Chromosoma. 2016;125:573–592. doi: 10.1007/s00412-016-0595-4. - DOI - PMC - PubMed

LinkOut - more resources