. 2024 Sep 9;5(9):100985.

doi: 10.1016/j.xplc.2024.100985. Epub 2024 Jun 10.

DeepCBA: A deep learning framework for gene expression prediction in maize based on DNA sequences and chromatin interactions

Zhenye Wang¹, Yong Peng², Jie Li¹, Jiying Li³, Hao Yuan¹, Shangpo Yang¹, Xinru Ding¹, Ao Xie¹, Jiangling Zhang⁴, Shouzhe Wang⁵, Keqin Li¹, Jiaqi Shi⁴, Guangjie Xing⁴, Weihan Shi⁴, Jianbing Yan², Jianxiao Liu⁶

Affiliations

¹ National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China; Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan 430070, China; College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.
² National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China; Hubei Hongshan Laboratory, Wuhan 430070, China.
³ Microsoft Corporation, Redmond, WA 98052, USA.
⁴ College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.
⁵ National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China; Hubei Hongshan Laboratory, Wuhan 430070, China; WIMI Biotechnology Co., Ltd., Changzhou 213000, China.
⁶ National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China; Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan 430070, China; College of Informatics, Huazhong Agricultural University, Wuhan 430070, China; Hubei Hongshan Laboratory, Wuhan 430070, China. Electronic address: liujianxiao@mail.hzau.edu.cn.

PMID: 38859587
PMCID: PMC11413363
DOI: 10.1016/j.xplc.2024.100985

DeepCBA: A deep learning framework for gene expression prediction in maize based on DNA sequences and chromatin interactions

Zhenye Wang et al. Plant Commun. 2024.

. 2024 Sep 9;5(9):100985.

doi: 10.1016/j.xplc.2024.100985. Epub 2024 Jun 10.

Authors

Affiliations

¹ National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China; Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan 430070, China; College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.
² National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China; Hubei Hongshan Laboratory, Wuhan 430070, China.
³ Microsoft Corporation, Redmond, WA 98052, USA.
⁴ College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.
⁵ National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China; Hubei Hongshan Laboratory, Wuhan 430070, China; WIMI Biotechnology Co., Ltd., Changzhou 213000, China.
⁶ National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China; Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan 430070, China; College of Informatics, Huazhong Agricultural University, Wuhan 430070, China; Hubei Hongshan Laboratory, Wuhan 430070, China. Electronic address: liujianxiao@mail.hzau.edu.cn.

PMID: 38859587
PMCID: PMC11413363
DOI: 10.1016/j.xplc.2024.100985

Abstract

Chromatin interactions create spatial proximity between distal regulatory elements and target genes in the genome, which has an important impact on gene expression, transcriptional regulation, and phenotypic traits. To date, several methods have been developed for predicting gene expression. However, existing methods do not take into consideration the effect of chromatin interactions on target gene expression, thus potentially reducing the accuracy of gene expression prediction and mining of important regulatory elements. In this study, we developed a highly accurate deep learning-based gene expression prediction model (DeepCBA) based on maize chromatin interaction data. Compared with existing models, DeepCBA exhibits higher accuracy in expression classification and expression value prediction. The average Pearson correlation coefficients (PCCs) for predicting gene expression using gene promoter proximal interactions, proximal-distal interactions, and both proximal and distal interactions were 0.818, 0.625, and 0.929, respectively, representing an increase of 0.357, 0.16, and 0.469 over the PCCs obtained with traditional methods that use only gene proximal sequences. Some important motifs were identified through DeepCBA; they were enriched in open chromatin regions and expression quantitative trait loci and showed clear tissue specificity. Importantly, experimental results for the maize flowering-related gene ZmRap2.7 and the tillering-related gene ZmTb1 demonstrated the feasibility of DeepCBA for exploration of regulatory elements that affect gene expression. Moreover, promoter editing and verification of two reported genes (ZmCLE7 and ZmVTE4) demonstrated the utility of DeepCBA for the precise design of gene expression and even for future intelligent breeding. DeepCBA is available at http://www.deepcba.com/ or http://124.220.197.196/.

Keywords: chromatin interactions; deep learning; gene expression prediction; maize; promoter editing; regulatory elements and motifs.

PubMed Disclaimer

Figures

**Figure 1**
The workflow of DeepCBA. **(A)** Two types of chromatin interactions: PPI and PDI. 1.5-kb gene proximal sequence of TSS and TTS. **(B)** Five steps of DeepCBA: sequence encoding, feature extraction using CNN, temporal and distal feature extraction using BiLSTM, self-attention mechanism, and gene expression prediction. **(C)** PCCs obtained using 3 methods for prediction of gene expression values. CNN_No_PPI: the CNN model using only gene upstream and downstream sequences. DeepCBA_No_PPI: the DeepCBA model using only gene upstream and downstream sequences. DeepCBA_PPI: the DeepCBA model using interaction sequences. Data augmentation denotes considering gene order in PPI mode during model training.

**Figure 2**
Performance of DeepCBA for prediction of maize gene expression in different modes. From left to right are the results of predictions based on PDI, PPI, and PDI + PPI, respectively. **(A–D)** The distribution of predicted values and true values of gene expression when the DeepCBA model was used to predict gene expression in shoots on the basis of PDI, PPI, and PDI + PPI. **(E–H)** The distribution of predicted values and true values of gene expression when the DeepCBA model was used to predict gene expression in ears on the basis of PDI, PPI, and PDI + PPI.

**Figure 3**
Motifs that influence gene expression can be identified on the basis of PPI sequences. **(A)** Effect on expression prediction of 2 interacting sequences input into the DeepCBA model. **(B)** Venn diagram showing the motifs identified in shoots and ears by DeepCBA in PPI mode. **(C)** Five different distribution patterns of motifs identified in shoots and ears: (1) highly enriched near 250 bp downstream of the TSS, (2) highly enriched at specific positions, (3) poorly enriched near 250 bp downstream of the TSS, (4) poorly enriched near TSSs but highly enriched near TTSs, (5) evenly distributed across the whole sequence. **(D and E)** Core motif sequences obtained using MetaLogo on the basis of motifs identified in ears and shoots in PPI mode. Six core sequences were obtained in the 2 tissues. **(F and G)** Changes in the expression of expressed and highly expressed genes containing different numbers of motifs in ears and shoots.

**Figure 4**
Epigenetic features and examples of gene expression regulation of motifs identified by DeepCBA (ears). **(A)** The matching number of motifs with different lengths and eQTLs in PPI mode. The motif sequences were removed from the PPI sequences, and sequences with lengths of 6–10 were randomly selected from the remaining PPI sequences as controls (∗∗*p < 0.05*, ∗∗∗*p < 0.01,* ∗∗∗∗p < 0.0001; t test). **(B)** The matching number of motifs with different lengths and eQTLs in PPI mode. PPI interaction sequences were removed from the whole genome, and sequences with lengths of 6–10 were randomly selected from the remaining sequences as controls (∗∗p < 0.05, ∗∗∗*p < 0.01, ∗∗∗∗p < 0.0001; t* test). **(C and D)** The matching number of motifs identified by DeepCBA in PPI mode and open chromatin regions in the NAM population. Controls were selected as described in **(A) and (B)**. **(E)** For the CATGCA motif identified in the *Zm00001d042609* sequence in PPI mode, the motif and the downstream gene *Zm00001d042600* can be bound simultaneously by the TF NACTF109. Variation in CATGCA in the maize association mapping panel was associated with differences in the expression of *Zm00001d042600* and thus affected maize drought resistance at the seedling stage.

**Figure 5**
Identification of regulatory elements in *ZmRap2.7*. **(A)** Distribution of open chromatin regions in 70-kb bins upstream of *ZmRap2.7* in different tissues and the sequence gradient values calculated by DeepCBA. **(B)** Identified motifs and TFs that can be bound in the 2 regions with the highest gradient values (chr8: 135941716–135942216 and chr8: 135945716–135946216). **(C and D)** The motifs and TFs that can be bound in the 3-kb region upstream of the TSS of *ZmRap2.7*. ①②③④ represent DNase-sequencing, assay for transposase accessible chromatin-sequencing, H3K4me3, and H3K9ac, respectively.

**Figure 6**
DeepCBA edits the maize genes *ZmCLE7* and *ZmVTE4* to achieve accurate expression prediction. **(A)** Distribution of 4 histone modifications (H3K27ac, H3K4me3, H3K27me3, H3K9ac) and open chromatin regions within the 4-kb upstream region of *ZmCLE7*. **(B)** Schematic diagram of 6 pieces of editing information for *ZmCLE7* in the published literature (Liu et al., 2021a, 2021b). **(C)** DeepCBA was used to predict the expression of gene-edited sequences in **(B)** and compare it with quantitative real-time PCR (qPCR) results from the published literature. **(D)** Using sliding windows (window size = 200 bp, step size = 200 bp) to process the 4-kb sequence of *ZmCLE7*. **(E)** Expression of the edited sequences in **(D)** predicted using DeepCBA. **(F)** Distribution of 3 histone modifications (H3K27ac, H3K4me3, and H3K9ac) and open chromatin regions within the 4-kb region upstream of *ZmVTE4*. **(G)** Gene editing events in the 4-kb region upstream of *ZmVTE4*. **(H)** Comparison of *ZmVTE4* expression predicted by DeepCBA and *ZmVTE4* expression measured by leaf quantitative real-time PCR for different gene editing events in the 4-kb upstream region of *ZmVTE4*.

**Figure 7**
The DeepCBA website. **(A)** Functions of the DeepCBA website. **(B)** DeepCBA enables high-precision gene expression prediction based on chromatin interactions for 4 crops: maize, rice, cotton, and wheat. Users can freely select relevant models to achieve the prediction tasks. **(C)** DeepCBA implements a parallel computing algorithm. The prediction results are sent to users via e-mail, and users can view the results according to the Job_id. **(D)** DeepCBA provides a visualization interface to display the gradient importance of the input sequences that affect gene expression.

See this image and copyright information in PMC

References

1. Avsec Ž., Agarwal V., Visentin D., Ledsam J.R., Grabska-Barwinska A., Taylor K.R., Assael Y., Jumper J., Kohli P., Kelley D.R. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods. 2021;18:1196–1203. - PMC - PubMed
1. Beer M.A., Tavazoie S. Predicting gene expression from sequence. Cell. 2004;117:185–198. - PubMed
1. Bailey T.L., Johnson J., Grant C.E., Noble W.S. The MEME suite. Nucleic Acids Res. 2015;43:W39–W49. - PMC - PubMed
1. Cheng A., Grant C.E., Noble W.S., Bailey T.L. MoMo: discovery of statistically significant post-translational modification motifs. Bioinformatics. 2019;35:2774–2782. - PMC - PubMed
1. Cheng C., Yan K.K., Yip K.Y., Rozowsky J., Alexander R., Shou C., Gerstein M. A statistical framework for modeling gene expression using chromatin features and application to modENCODE datasets. Genome Biol. 2011;12:R15–R18. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
- Elsevier Science
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

DeepCBA: A deep learning framework for gene expression prediction in maize based on DNA sequences and chromatin interactions

Affiliations

DeepCBA: A deep learning framework for gene expression prediction in maize based on DNA sequences and chromatin interactions

Authors

Affiliations

Abstract

Figures

Similar articles

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Abstract

Figures

Similar articles

References

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources