Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Sep 1:6:32476.
doi: 10.1038/srep32476.

In silico identification of enhancers on the basis of a combination of transcription factor binding motif occurrences

Affiliations

In silico identification of enhancers on the basis of a combination of transcription factor binding motif occurrences

Yaping Fang et al. Sci Rep. .

Abstract

Enhancers interact with gene promoters and form chromatin looping structures that serve important functions in various biological processes, such as the regulation of gene transcription and cell differentiation. However, enhancers are difficult to identify because they generally do not have fixed positions or consensus sequence features, and biological experiments for enhancer identification are costly in terms of labor and expense. In this work, several models were built by using various sequence-based feature sets and their combinations for enhancer prediction. The selected features derived from a recursive feature elimination method showed that the model using a combination of 141 transcription factor binding motif occurrences from 1,422 transcription factor position weight matrices achieved a favorably high prediction accuracy superior to that of other reported methods. The models demonstrated good prediction accuracy for different enhancer datasets obtained from different cell lines/tissues. In addition, prediction accuracy was further improved by integration of chromatin state features. Our method is complementary to wet-lab experimental methods and provides an additional method to identify enhancers.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Percentage of selected features across different feature groups.
The x-axis represents different feature groups and their combinations. The y-axis represents the percentage of selected features. I represents DNA property features; II represents TF binding motif occurrence features; III represents k-mer features; IV represents chromatin state features.
Figure 2
Figure 2. The receiver operator characteristic (ROC) curves for selected feature groups.
Area under the ROC curve (AUC): DNA property features (black line): 0.793; TF binding motif occurrence features (red dashed line): 0.9698; k-mer features (orange dashed line): 0.5213; chromatin state features (green dashed line): 0.9159; chromatin state and TF binding motif occurrence features (blue dashed line): 0.989; TF RPM features (purple dashed line): 0.9964. TF RPM represents the Reads Per Million mapped reads per base pair densities (RPM) of ChIP-Seq data from 61 TFs.
Figure 3
Figure 3. Importance of the model incorporating TF binding motif occurrence features.
The importance of the top 50 selected features in the model with 141 TF binding motif occurrence features is shown. The prefix M represents Position Weight Matrix (PWM).
Figure 4
Figure 4. Importance of selected features.
(A) shows the importance of the top 50 selected features in the model with the feature groups for TF binding motif occurrence and chromatin state in Table 3. (B) shows the importance of the 28 selected features in the model based on the feature group TF RPM in Table 3. TF RPM represents the Reads Per Million mapped reads per base pair densities (RPM) of ChIP-Seq data from 61 TFs. The prefix M represents Position Weight Matrix(PWM).
Figure 5
Figure 5. Venn diagram of the selected TF binding motif occurrence features of the model incorporating the feature group TF binding motif occurrence and the model incorporating the feature groups of TF binding motif occurrence and chromatin state.
A represents selected TF binding motif occurrence features of the model incorporating the feature group TF binding motif occurrence and chromatin state. B represents selected TF binding motif occurrence features of the model incorporating the feature group TF binding motif occurrence.
Figure 6
Figure 6. Overall schematic of this work.

Similar articles

Cited by

References

    1. Erokhin M., Vassetzky Y., Georgiev P. & Chetverina D. Eukaryotic enhancers: common features, regulation, and participation in diseases. Cellular and Molecular Life Sciences 72, 2361–2375 (2015). - PMC - PubMed
    1. Pott S. & Lieb J. D. What are super-enhancers? Nat Genet 47, 8–12 (2015). - PubMed
    1. Zhang Y. B. et al.. Chromatin connectivity maps reveal dynamic promoter-enhancer long-range associations. Nature 504, 306-+ (2013). - PMC - PubMed
    1. Ishii H., Kadonaga J. T. & Ren B. MPE-seq, a new method for the genome-wide analysis of chromatin structure. Proc Natl Acad Sci USA 112, E3457–E3465 (2015). - PMC - PubMed
    1. Espinoza C. A. & Ren B. Mapping higher order structure of chromatin domains. Nat Genet 43, 615–U201 (2011). - PubMed

Publication types

MeSH terms

LinkOut - more resources