Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jul 27;44(13):e120.
doi: 10.1093/nar/gkw446. Epub 2016 Jun 1.

Quantitative modeling of gene expression using DNA shape features of binding sites

Affiliations

Quantitative modeling of gene expression using DNA shape features of binding sites

Pei-Chen Peng et al. Nucleic Acids Res. .

Abstract

Prediction of gene expression levels driven by regulatory sequences is pivotal in genomic biology. A major focus in transcriptional regulation is sequence-to-expression modeling, which interprets the enhancer sequence based on transcription factor concentrations and DNA binding specificities and predicts precise gene expression levels in varying cellular contexts. Such models largely rely on the position weight matrix (PWM) model for DNA binding, and the effect of alternative models based on DNA shape remains unexplored. Here, we propose a statistical thermodynamics model of gene expression using DNA shape features of binding sites. We used rigorous methods to evaluate the fits of expression readouts of 37 enhancers regulating spatial gene expression patterns in Drosophila embryo, and show that DNA shape-based models perform arguably better than PWM-based models. We also observed DNA shape captures information complimentary to the PWM, in a way that is useful for expression modeling. Furthermore, we tested if combining shape and PWM-based features provides better predictions than using either binding model alone. Our work demonstrates that the increasingly popular DNA-binding models based on local DNA shape can be useful in sequence-to-expression modeling. It also provides a framework for future studies to predict gene expression better than with PWM models alone.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
DNA shape-based model of gene expression. A TF binding site is described by four shape feature vectors: MGW, ProT, Roll and HelT. Each vector includes the corresponding shape feature at every position of the site, along with the mean and standard deviation over all positions. For a given TF, a Random Forest classifier is trained on a sample of binding sites from Fly Factor Survey database to predict shape scores for putative binding sites.
Figure 2.
Figure 2.
Performance of DNA shape-based model compared to PWM-based model on 37 Drosophila enhancers. The goodness of fit between predicted and real expression for each enhancer was assessed by wPGP scores. Dotted lines delineate regions where the difference in wPGP score between the two models is <0.05.
Figure 3.
Figure 3.
Fits between model and data. Predicted expression profiles of DNA shape-based model (orange lines) and PWM-based model (purple lines) are compared to experimentally determined expression profiles (black lines), for six selected Drosophila enhancers. Each expression profile is on a relative scale of 0 to 1 (y-axis), and shown for the regions between 20% and 80% of the A/P axis of the embryo. Title in each panel is in the format of “enhancer name, wPGP by DNA shape-based model (‘S’), wPGP by PWM-based model (‘P’).” See more enhancers fits in Supplementary Figure S1.
Figure 4.
Figure 4.
DNA shape is characterized differently from PWM (A) Change of goodness of fit (avg. wPGP) of DNA shape-based model predictions when binding sites of a specific TF were forced to use LLR rather than shape scores. (B) Visualization of kni binding sites correlation between shape scores and LLR. (C) Pearson correlations of binding sites for each of nine TF in this study and all TFs.
Figure 5.
Figure 5.
Performance of integrative models compared to (A) PWM-based model and (B) DNA shape-based model on 37 Drosophila enhancers assessed by wPGP scores. Dotted lines delineate regions where the difference in wPGP between the two models is greater than 0.05.

Similar articles

Cited by

References

    1. Carroll S.B., Grenier J.K., Weatherbee S.D. From DNA to diversity: molecular genetics and the evolution of animal design. Hoboken: John Wiley & Sons; 2013.
    1. Davidson E.H. The regulatory genome: gene regulatory networks in development and evolution. Cambridge: Academic Press; 2010.
    1. Slattery M., Zhou T., Yang L., Dantas Machado A.C., Gordân R., Rohs R. Absence of a simple code: how transcription factors read the genome. Trends Biochem. Sci. 2014;39:381–399. - PMC - PubMed
    1. Rohs R., Jin X., West S.M., Joshi R., Honig B., Mann R.S. Origins of specificity in protein-DNA recognition. Annu. Rev. Biochem. 2010;79:233–269. - PMC - PubMed
    1. Siggers T., Gordân R. Protein–DNA binding: complexities and multi-protein codes. Nucleic Acids Res. 2013;42:2099–2111. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources