Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jul 15;33(14):i252-i260.
doi: 10.1093/bioinformatics/btx257.

Exploiting sequence-based features for predicting enhancer-promoter interactions

Affiliations

Exploiting sequence-based features for predicting enhancer-promoter interactions

Yang Yang et al. Bioinformatics. .

Abstract

Motivation: A large number of distal enhancers and proximal promoters form enhancer-promoter interactions to regulate target genes in the human genome. Although recent high-throughput genome-wide mapping approaches have allowed us to more comprehensively recognize potential enhancer-promoter interactions, it is still largely unknown whether sequence-based features alone are sufficient to predict such interactions.

Results: Here, we develop a new computational method (named PEP) to predict enhancer-promoter interactions based on sequence-based features only, when the locations of putative enhancers and promoters in a particular cell type are given. The two modules in PEP (PEP-Motif and PEP-Word) use different but complementary feature extraction strategies to exploit sequence-based information. The results across six different cell types demonstrate that our method is effective in predicting enhancer-promoter interactions as compared to the state-of-the-art methods that use functional genomic signals. Our work demonstrates that sequence-based features alone can reliably predict enhancer-promoter interactions genome-wide, which could potentially facilitate the discovery of important sequence determinants for long-range gene regulation.

Availability and implementation: The source code of PEP is available at: https://github.com/ma-compbio/PEP .

Contact: jianma@cs.cmu.edu.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Method overview of PEP
Fig. 2
Fig. 2
Evaluation of PEP-Motif, PEP-Word and PEP-Integrate (K =6 for K-mers) on E/P data from six cell lines in comparison with TargetFinder (E/P/W) in terms of AUROC, AUPR, Precision, Recall, F1 and MCC
Fig. 3
Fig. 3
Estimated feature importance of motifs in PEP-Motif that have top 5% importance in at least one cell line. The feature importance is scaled between 0 (low importance) and 1 (high importance). Of the 503 motif representatives (427 single motifs and 76 motif clusters) found by PEP-Motif, 139 in enhancers and 48 in promoters have top 5% feature importance in at least one cell line. Here we display the top 100 of 139 predictive motif representatives in enhancers and all 48 predictive motif representatives in promoters. Each motif is represented by the name of its corresponding TF. If a TF has multiple associated motifs, alternative motifs are marked according to their identities in the database [e.g. EHF(S) denotes a single site motif of EHF (Kulakovskiy et al., 2016)]. If a motif represents a motif cluster, names of all the member motifs are shown in combination. We performed hierarchical clustering on both motifs (rows of the feature importance matrix) and cell types (columns) to have the motif features grouped. A cell is highlighted with white border if the corresponding motif has top 5% feature importance in the respective cell type

References

    1. Bailey S.D. et al. (2015) ZNF143 provides sequence specificity to secure chromatin interactions at gene promoters. Nat. Commun., 2, 6186. - PMC - PubMed
    1. Bonev B., Cavalli G. (2016) Organization and function of the 3d genome. Nat. Rev. Genet., 17, 661–678. - PubMed
    1. Chen T., Guestrin C. (2016a) XGBoost. https://github.com/dmlc/xgboost.
    1. Chen T., Guestrin C. (2016b) XGBoost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, p.785–794. ACM, New York, NY, USA.
    1. Davis J., Goadrich M. (2006) The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning, p.233–240. ACM, New York, NY, USA.