Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Mar 1;18(1):141.
doi: 10.1186/s12859-017-1495-1.

Combining phylogenetic footprinting with motif models incorporating intra-motif dependencies

Affiliations

Combining phylogenetic footprinting with motif models incorporating intra-motif dependencies

Martin Nettling et al. BMC Bioinformatics. .

Abstract

Background: Transcriptional gene regulation is a fundamental process in nature, and the experimental and computational investigation of DNA binding motifs and their binding sites is a prerequisite for elucidating this process. Approaches for de-novo motif discovery can be subdivided in phylogenetic footprinting that takes into account phylogenetic dependencies in aligned sequences of more than one species and non-phylogenetic approaches based on sequences from only one species that typically take into account intra-motif dependencies. It has been shown that modeling (i) phylogenetic dependencies as well as (ii) intra-motif dependencies separately improves de-novo motif discovery, but there is no approach capable of modeling both (i) and (ii) simultaneously.

Results: Here, we present an approach for de-novo motif discovery that combines phylogenetic footprinting with motif models capable of taking into account intra-motif dependencies. We study the degree of intra-motif dependencies inferred by this approach from ChIP-seq data of 35 transcription factors. We find that significant intra-motif dependencies of orders 1 and 2 are present in all 35 datasets and that intra-motif dependencies of order 2 are typically stronger than those of order 1. We also find that the presented approach improves the classification performance of phylogenetic footprinting in all 35 datasets and that incorporating intra-motif dependencies of order 2 yields a higher classification performance than incorporating such dependencies of only order 1.

Conclusion: Combining phylogenetic footprinting with motif models incorporating intra-motif dependencies leads to an improved performance in the classification of transcription factor binding sites. This may advance our understanding of transcriptional gene regulation and its evolution.

Keywords: ChIP-Seq; Evolution; Gene regulation; Phylogenetic footprinting; Transcription factor binding sites.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Sequence logos and intra-motif dependencies for the TFs a CJUN and b Nrf. We depict for both TFs (i) the sequence logo inferred by the PFM(2) from all species in the first row and (ii) the MI profiles of orders 1 and 2 inferred by the PFM(2) in the second row. The MI profiles of order 2 are larger than the MI profiles of order 1. Please see Additional file 3 for the MI profiles of all 35 TFs and Additional file 5 for all sequence logos of all 35 TFs for the PFMs of orders 0, 1, and 2
Fig. 2
Fig. 2
Maximum and average MIs of MI profiles inferred by the PFM(2) for all 35 TFs. In Fig. a we show the maximum MI of the MI profiles of orders 1 and 2. In Fig. b we show the average MI of the MI profiles of orders 1 and 2. The dashed lines indicate the mean of the maximum MIs and the mean of the average MIs for both MI profiles respectively. The degree of intra–motif dependencies depends of the TF and is always larger in case of intra–motif dependencies of order 2. Please see Additional file 3 for the MI profiles of all 35 TFs
Fig. 3
Fig. 3
Classification performance for PFMs with base dependencies of orders 0,1 and 2. a We show the mean and standard error of the ROC AUC for PFMs of orders 0, 1, and 2 averaged over 25–fold stratified repeated random subsampling. b We plot the mean and standard error of the relative increase of the ROC AUC for the PFMs of orders 1 and 2 relative to the PFM or order 0 for each of the 35 TFs. Taking into account base dependencies of order 1 increases the classification performance for 31 TFs. Taking into account base dependencies of order 2 increases the classification performance in all cases and is larger compared to taking into account base dependencies of order 1 in all cases. See Additional file 6 for detailed ROC and PR curves for the PFMs of order 2
Fig. 4
Fig. 4
Classification performance averaged for all 35 TFs. a We show the ROC AUC for PFMs of orders 0, 1, and 2 in percent averaged over 25–fold stratified repeated random subsampling and averaged over all 35 TFs. The overall classification performance increases with the order of the PFM. b We show the improvement of the ROC AUC for the PFMs of orders 1 and 2 relative to the PFM of order 0 averaged over 25–fold stratified repeated random subsampling and averaged over all 35 TFs

Similar articles

Cited by

References

    1. Smith ZD, Meissner A. DNA methylation: roles in mammalian development. Nat Rev Genet. 2013;14(3):204–20. doi: 10.1038/nrg3354. - DOI - PubMed
    1. Tessarz P, Kouzarides T. Histone core modifications regulating nucleosome structure and dynamics. Nat Rev Mol Cell Biol. 2014;15(11):703–8. doi: 10.1038/nrm3890. - DOI - PubMed
    1. Sainsbury S, Bernecky C, Cramer P. Structural basis of transcription initiation by RNA polymerase II. Nat Rev Mol Cell Biol. 2015;16(3):129–43. doi: 10.1038/nrm3952. - DOI - PubMed
    1. Schoenberg DR, Maquat LE. Regulation of cytoplasmic mRNA decay. Nat Rev Genet. 2012;13(4):246–59. - PMC - PubMed
    1. de Fougerolles A, Vornlocher HP, Maraganore J, Lieberman J. Interfering with disease: a progress report on sirna-based therapeutics. Nat Rev Drug Discov. 2007;6(6):443–53. doi: 10.1038/nrd2310. - DOI - PMC - PubMed

LinkOut - more resources