Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Sep 9;5(9):100985.
doi: 10.1016/j.xplc.2024.100985. Epub 2024 Jun 10.

DeepCBA: A deep learning framework for gene expression prediction in maize based on DNA sequences and chromatin interactions

Affiliations

DeepCBA: A deep learning framework for gene expression prediction in maize based on DNA sequences and chromatin interactions

Zhenye Wang et al. Plant Commun. .

Abstract

Chromatin interactions create spatial proximity between distal regulatory elements and target genes in the genome, which has an important impact on gene expression, transcriptional regulation, and phenotypic traits. To date, several methods have been developed for predicting gene expression. However, existing methods do not take into consideration the effect of chromatin interactions on target gene expression, thus potentially reducing the accuracy of gene expression prediction and mining of important regulatory elements. In this study, we developed a highly accurate deep learning-based gene expression prediction model (DeepCBA) based on maize chromatin interaction data. Compared with existing models, DeepCBA exhibits higher accuracy in expression classification and expression value prediction. The average Pearson correlation coefficients (PCCs) for predicting gene expression using gene promoter proximal interactions, proximal-distal interactions, and both proximal and distal interactions were 0.818, 0.625, and 0.929, respectively, representing an increase of 0.357, 0.16, and 0.469 over the PCCs obtained with traditional methods that use only gene proximal sequences. Some important motifs were identified through DeepCBA; they were enriched in open chromatin regions and expression quantitative trait loci and showed clear tissue specificity. Importantly, experimental results for the maize flowering-related gene ZmRap2.7 and the tillering-related gene ZmTb1 demonstrated the feasibility of DeepCBA for exploration of regulatory elements that affect gene expression. Moreover, promoter editing and verification of two reported genes (ZmCLE7 and ZmVTE4) demonstrated the utility of DeepCBA for the precise design of gene expression and even for future intelligent breeding. DeepCBA is available at http://www.deepcba.com/ or http://124.220.197.196/.

Keywords: chromatin interactions; deep learning; gene expression prediction; maize; promoter editing; regulatory elements and motifs.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The workflow of DeepCBA. (A) Two types of chromatin interactions: PPI and PDI. 1.5-kb gene proximal sequence of TSS and TTS. (B) Five steps of DeepCBA: sequence encoding, feature extraction using CNN, temporal and distal feature extraction using BiLSTM, self-attention mechanism, and gene expression prediction. (C) PCCs obtained using 3 methods for prediction of gene expression values. CNN_No_PPI: the CNN model using only gene upstream and downstream sequences. DeepCBA_No_PPI: the DeepCBA model using only gene upstream and downstream sequences. DeepCBA_PPI: the DeepCBA model using interaction sequences. Data augmentation denotes considering gene order in PPI mode during model training.
Figure 2
Figure 2
Performance of DeepCBA for prediction of maize gene expression in different modes. From left to right are the results of predictions based on PDI, PPI, and PDI + PPI, respectively. (A–D) The distribution of predicted values and true values of gene expression when the DeepCBA model was used to predict gene expression in shoots on the basis of PDI, PPI, and PDI + PPI. (E–H) The distribution of predicted values and true values of gene expression when the DeepCBA model was used to predict gene expression in ears on the basis of PDI, PPI, and PDI + PPI.
Figure 3
Figure 3
Motifs that influence gene expression can be identified on the basis of PPI sequences. (A) Effect on expression prediction of 2 interacting sequences input into the DeepCBA model. (B) Venn diagram showing the motifs identified in shoots and ears by DeepCBA in PPI mode. (C) Five different distribution patterns of motifs identified in shoots and ears: (1) highly enriched near 250 bp downstream of the TSS, (2) highly enriched at specific positions, (3) poorly enriched near 250 bp downstream of the TSS, (4) poorly enriched near TSSs but highly enriched near TTSs, (5) evenly distributed across the whole sequence. (D and E) Core motif sequences obtained using MetaLogo on the basis of motifs identified in ears and shoots in PPI mode. Six core sequences were obtained in the 2 tissues. (F and G) Changes in the expression of expressed and highly expressed genes containing different numbers of motifs in ears and shoots.
Figure 4
Figure 4
Epigenetic features and examples of gene expression regulation of motifs identified by DeepCBA (ears). (A) The matching number of motifs with different lengths and eQTLs in PPI mode. The motif sequences were removed from the PPI sequences, and sequences with lengths of 6–10 were randomly selected from the remaining PPI sequences as controls (∗∗p < 0.05, ∗∗∗p < 0.01, ∗∗∗∗p < 0.0001; t test). (B) The matching number of motifs with different lengths and eQTLs in PPI mode. PPI interaction sequences were removed from the whole genome, and sequences with lengths of 6–10 were randomly selected from the remaining sequences as controls (∗∗p < 0.05, ∗∗∗p < 0.01, ∗∗∗∗p < 0.0001; t test). (C and D) The matching number of motifs identified by DeepCBA in PPI mode and open chromatin regions in the NAM population. Controls were selected as described in (A) and (B). (E) For the CATGCA motif identified in the Zm00001d042609 sequence in PPI mode, the motif and the downstream gene Zm00001d042600 can be bound simultaneously by the TF NACTF109. Variation in CATGCA in the maize association mapping panel was associated with differences in the expression of Zm00001d042600 and thus affected maize drought resistance at the seedling stage.
Figure 5
Figure 5
Identification of regulatory elements in ZmRap2.7. (A) Distribution of open chromatin regions in 70-kb bins upstream of ZmRap2.7 in different tissues and the sequence gradient values calculated by DeepCBA. (B) Identified motifs and TFs that can be bound in the 2 regions with the highest gradient values (chr8: 135941716–135942216 and chr8: 135945716–135946216). (C and D) The motifs and TFs that can be bound in the 3-kb region upstream of the TSS of ZmRap2.7. ①②③④ represent DNase-sequencing, assay for transposase accessible chromatin-sequencing, H3K4me3, and H3K9ac, respectively.
Figure 6
Figure 6
DeepCBA edits the maize genes ZmCLE7 and ZmVTE4 to achieve accurate expression prediction. (A) Distribution of 4 histone modifications (H3K27ac, H3K4me3, H3K27me3, H3K9ac) and open chromatin regions within the 4-kb upstream region of ZmCLE7. (B) Schematic diagram of 6 pieces of editing information for ZmCLE7 in the published literature (Liu et al., 2021a, 2021b). (C) DeepCBA was used to predict the expression of gene-edited sequences in (B) and compare it with quantitative real-time PCR (qPCR) results from the published literature. (D) Using sliding windows (window size = 200 bp, step size = 200 bp) to process the 4-kb sequence of ZmCLE7. (E) Expression of the edited sequences in (D) predicted using DeepCBA. (F) Distribution of 3 histone modifications (H3K27ac, H3K4me3, and H3K9ac) and open chromatin regions within the 4-kb region upstream of ZmVTE4. (G) Gene editing events in the 4-kb region upstream of ZmVTE4. (H) Comparison of ZmVTE4 expression predicted by DeepCBA and ZmVTE4 expression measured by leaf quantitative real-time PCR for different gene editing events in the 4-kb upstream region of ZmVTE4.
Figure 7
Figure 7
The DeepCBA website. (A) Functions of the DeepCBA website. (B) DeepCBA enables high-precision gene expression prediction based on chromatin interactions for 4 crops: maize, rice, cotton, and wheat. Users can freely select relevant models to achieve the prediction tasks. (C) DeepCBA implements a parallel computing algorithm. The prediction results are sent to users via e-mail, and users can view the results according to the Job_id. (D) DeepCBA provides a visualization interface to display the gradient importance of the input sequences that affect gene expression.

Similar articles

References

    1. Avsec Ž., Agarwal V., Visentin D., Ledsam J.R., Grabska-Barwinska A., Taylor K.R., Assael Y., Jumper J., Kohli P., Kelley D.R. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods. 2021;18:1196–1203. - PMC - PubMed
    1. Beer M.A., Tavazoie S. Predicting gene expression from sequence. Cell. 2004;117:185–198. - PubMed
    1. Bailey T.L., Johnson J., Grant C.E., Noble W.S. The MEME suite. Nucleic Acids Res. 2015;43:W39–W49. - PMC - PubMed
    1. Cheng A., Grant C.E., Noble W.S., Bailey T.L. MoMo: discovery of statistically significant post-translational modification motifs. Bioinformatics. 2019;35:2774–2782. - PMC - PubMed
    1. Cheng C., Yan K.K., Yip K.Y., Rozowsky J., Alexander R., Shou C., Gerstein M. A statistical framework for modeling gene expression using chromatin features and application to modENCODE datasets. Genome Biol. 2011;12:R15–R18. - PMC - PubMed

LinkOut - more resources