Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jul 30;17(1):62.
doi: 10.1186/s12915-019-0679-8.

Mechanistic interpretation of non-coding variants for discovering transcriptional regulators of drug response

Affiliations

Mechanistic interpretation of non-coding variants for discovering transcriptional regulators of drug response

Xiaoman Xie et al. BMC Biol. .

Abstract

Background: Identification of functional non-coding variants and their mechanistic interpretation is a major challenge of modern genomics, especially for precision medicine. Transcription factor (TF) binding profiles and epigenomic landscapes in reference samples allow functional annotation of the genome, but do not provide ready answers regarding the effects of non-coding variants on phenotypes. A promising computational approach is to build models that predict TF-DNA binding from sequence, and use such models to score a variant's impact on TF binding strength. Here, we asked if this mechanistic approach to variant interpretation can be combined with information on genotype-phenotype associations to discover transcription factors regulating phenotypic variation among individuals.

Results: We developed a statistical approach that integrates phenotype, genotype, gene expression, TF ChIP-seq, and Hi-C chromatin interaction data to answer this question. Using drug sensitivity of lymphoblastoid cell lines as the phenotype of interest, we tested if non-coding variants statistically linked to the phenotype are enriched for strong predicted impact on DNA binding strength of a TF and thus identified TFs regulating individual differences in the phenotype. Our approach relies on a new method for predicting variant impact on TF-DNA binding that uses a combination of biophysical modeling and machine learning. We report statistical and literature-based support for many of the TFs discovered here as regulators of drug response variation. We show that the use of mechanistically driven variant impact predictors can identify TF-drug associations that would otherwise be missed. We examined in depth one reported association-that of the transcription factor ELF1 with the drug doxorubicin-and identified several genes that may mediate this regulatory relationship.

Conclusion: Our work represents initial steps in utilizing predictions of variant impact on TF binding sites for discovery of regulatory mechanisms underlying phenotypic variation. Future advances on this topic will be greatly beneficial to the reconstruction of phenotype-associated gene regulatory networks.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Process of scoring TFBS-SNP impact and identifying a TF’s “binding change SNPs.” a We build a STAP model to predict TF binding at a DNA segment, separately for every available motif from ENCODE, FactorBook, and HOCOMOCO that represents the TF. For a given sequence, each motif-specific STAP model outputs a score indicating the occupancy of the TF on the sequence. An SVM model then combines STAP scores from all motifs of the TF to compute a combined score of the TF’s binding to the sequence; this is called the “MOP” score. b “Delta-MOP” score of a SNP is defined as the absolute value of the difference between the MOP scores of the major and minor allele sequences, constructed from the 501-bp sequence centered on the SNP location. In this example, SNP rs6717613 (G->A) is found to have a Delta-MOP score of 0.45 for the TF ATF2, which is the difference of MOP scores between the major and minor alleles (0.29 and 0.74 respectively). MOP scores were based on combining scores for six different ATF2 motifs (logos shown). The Delta-MOP score in this example can be qualitatively understood in terms of matches of the core binding site (top) to each of the six ATF2 motifs, whose STAP scores are shown separately for the two alleles (bottom). The core site’s match to motifs ATF2-1, ATF2-2, and ATF-6 changes in strength between the two alleles. For instance, the SNP falls on the 10th position of motif ATF2-1, which prefers an “A,” and the change from “G” (major allele) to “A” (minor allele) is interpreted as a change in strength of motif match. On the other hand, the core site does not have a strong match to ATF2-3 or ATF2-4, in either allelic form, while motif ATF2-5 overlaps the core site but not the SNP position. The Delta-MOP score combines these different pieces of information in a principled manner to compute an overall score of the impact of rs6717613 on ATF2 binding
Fig. 2
Fig. 2
a, b Comparison of three TF binding predictors. We compared MOP with STAP and gkm-SVM. The performance of each model is measure by the Pearson correlation coefficient (CC) between ChIP score and predicted binding score on a test set of 400 sequences that are not used in model training. Performance evaluation is performed for each of 37 data sets (for different TFs). a MOP performs as well or better than STAP (using the best motif when multiple motifs are available) for 26 of the 37 data sets, with their average CC being 0.39 and 0.36 respectively. b MOP performs as well or better than gkm-SVM for 21 of 37 TF data sets examined, with average CC of the two methods being 0.39 and 0.37 respectively. ce Evaluation of TFBS-SNP impact prediction methods. Four different methods of binding change prediction (Delta-MOP, Delta-gkm-SVM, Delta-STAP, and Delta-PWM) were evaluated for their ability to predict allele-specific binding (ASB) events from non-ASB events, for each of 16 data sets based on ChIP-seq data for different TFs. Performance was measured using AUROC as well as AUPRC. ROC curve of RUNX3 using “Delta-MOP” as impact predictor is shown in (c). The last two rows show pairwise comparison of Delta-MOP and each of the other three methods based on AUROC (d) and AUPRC (e) achieved by the methods on the same data set
Fig. 3
Fig. 3
Process of identifying TFs regulating phenotypic variation. A hypergeometric test is used to test the overlap between a TF’s “binding change SNPs,” based on presence within ChIP peaks from ENCODE and high Delta-MOP score, and “phenotype-associated SNPs,” i.e., eQTLs of genes whose expression correlates with phenotype, located within cis-regulatory regions of the gene identified by Hi-C data. A TF is considered significant to the phenotype if the FDR q value is below 0.05
Fig. 4
Fig. 4
Predicted mechanisms of ELF1 regulation of doxorubicin-induced apoptosis. Black solid arrows show the skeleton of two major pathways to doxorubicin-induced apoptosis, viz., those mediated by DNA damage and reactive oxygen species (ROS) respectively. Genes directly involved in these pathways are shown as ovals placed on the arrows. Green ovals represent drug response-associated genes that are predicted to be regulated by ELF1 and have been previously shown to have regulatory function on a pathway gene. Such regulatory evidence, presented in previous literature, is represented by gray dashed arrows connecting ELF1-regulated DRGs to pathway genes

Similar articles

Cited by

References

    1. Welter D, MacArthur J, Morales J, Burdett T, Hall P, Junkins H, et al. The NHGRI GWAS catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 2014;42(Database issue):D1001–D1006. - PMC - PubMed
    1. Qian Q, Wang Y, Zhou R, Yang L, Faraone SV. Family-based and case-control association studies of DRD4 and DAT1 polymorphisms in Chinese attention deficit hyperactivity disorder patients suggest long repeats contribute to genetic risk for the disorder. Am J Med Genet B Neuropsychiatr Genet. 2004;128B(1):84–89. - PubMed
    1. Li Q, Seo JH, Stranger B, McKenna A, Pe'er I, Laframboise T, et al. Integrative eQTL-based analyses reveal the biology of breast cancer risk loci. Cell. 2013;152(3):633–641. - PMC - PubMed
    1. West MA, Kim K, Kliebenstein DJ, van Leeuwen H, Michelmore RW, Doerge RW, et al. Global eQTL mapping reveals the complex genetic architecture of transcript-level variation in Arabidopsis. Genetics. 2007;175(3):1441–1450. - PMC - PubMed
    1. Zhang Y, Manjunath M, Zhang S, Chasman D, Roy S, Song JS. Integrative genomic analysis predicts causative cis-regulatory mechanisms of the breast cancer-associated genetic variant rs4415084. Cancer Res. 2018;78(7):1579–1591. - PMC - PubMed

Publication types

Substances

LinkOut - more resources