Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jan 3;25(1):607.
doi: 10.3390/ijms25010607.

AtSNP_TATAdb: Candidate Molecular Markers of Plant Advantages Related to Single Nucleotide Polymorphisms within Proximal Promoters of Arabidopsis thaliana L

Affiliations

AtSNP_TATAdb: Candidate Molecular Markers of Plant Advantages Related to Single Nucleotide Polymorphisms within Proximal Promoters of Arabidopsis thaliana L

Anton Bogomolov et al. Int J Mol Sci. .

Abstract

The mainstream of the post-genome target-assisted breeding in crop plant species includes biofortification such as high-throughput phenotyping along with genome-based selection. Therefore, in this work, we used the Web-service Plant_SNP_TATA_Z-tester, which we have previously developed, to run a uniform in silico analysis of the transcriptional alterations of 54,013 protein-coding transcripts from 32,833 Arabidopsis thaliana L. genes caused by 871,707 SNPs located in the proximal promoter region. The analysis identified 54,993 SNPs as significantly decreasing or increasing gene expression through changes in TATA-binding protein affinity to the promoters. The existence of these SNPs in highly conserved proximal promoters may be explained as intraspecific diversity kept by the stabilizing natural selection. To support this, we hand-annotated papers on some of the Arabidopsis genes possessing these SNPs or on their orthologs in other plant species and demonstrated the effects of changes in these gene expressions on plant vital traits. We integrated in silico estimates of the TBP-promoter affinity in the AtSNP_TATAdb knowledge base and showed their significant correlations with independent in vivo experimental data. These correlations appeared to be robust to variations in statistical criteria, genomic environment of TATA box regions, plants species and growing conditions.

Keywords: TATA box; TATA-binding protein; estimates in silico; gene expression; genome-wide analysis; noncoding polymorphism; target-assisted breeding; verification in vivo.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

Figure 1
Figure 1
Flowchart of AtSNP_TATAdb knowledge base development by processing genome-wide information from the Ensembl Plant database [31] by first using Plant_SNP_TATA_Z-tester [24] and, after that, selectively annotating some of the SNPs by a search in the PubMed database [39] using information about where, when and under what conditions changes in expression of these genes or their homolog genes were observed in agricultural studies. Finally, we selectively verified estimates in silico using independent experimental data in vivo taken from the PubMed database [39].
Figure 2
Figure 2
A sample entry in the AtSNP_TATAdb documents on two SNPs ENSVATH01403825:A and ENSVATH01403824:T of the ARF1 (Auxin Response Factor 1) gene, which can significantly downregulate and upregulate its expression, as calculated in this work together with their annotation (see Supplementary Materials, Table S1: first row) using the database PubMed [39] (PMIDs: 28255787, 35291484, 17021043).
Figure 3
Figure 3
A comparison between the distributions of the KD-values calculated in silico using ancestral and minor alleles of the Arabidopsis SNPs. Legend: χ2 and D, the scores of Pearson’s chi-square test and Kolmogorov–Smirnov’s test for assessing the difference between two distributions, respectively; t and Z, the scores of Student’s t-test and Fisher’s Z-test for comparing the difference between two estimates of arithmetic means, respectively; F, Fisher’s F-test for comparing the difference between two estimates of the standard deviation; PADJ, statistical significance level with the Bonferroni correction for multiple comparisons calculated using STATISTICA version 10.0 (StatsoftTM, Tulsa, OK, USA).
Figure 4
Figure 4
The frequencies of occurrence along promoters of candidate SNP markers that increase versus decrease the expression of protein-coding genes in Arabidopsis. Legend: error bars: the 95% confidence interval boundaries with the Bonferroni correction for multiple comparisons calculated using STATISTICA version 10.0 (StatsoftTM, Tulsa, OK, USA).
Figure 5
Figure 5
The output of Web service Plant_SNP_TATA_Z-tester [24] regarding assessment of SNPs (a) ENSVATH01403825:A and (b) ENSVATH01403824:T located in the proximal promoter of the Arabidopsis gene ARF1. One can see the results along with their annotations in Figure 2 as an illustration of how to use the knowledge base AtSNP_TATAdb.
Figure 6
Figure 6
Statistically significant correlations between the KD-values of the equilibrium dissociation constant expressed in nanomoles per liter (nM) of the complexes between the TBP and only the ancestral alleles of the promoters of the A. thaliana genes as evaluated in silico and stored in the AtSNP_TATAdb (X-axis) and the log2-value of the promoter strength (Y-axis), which were measured in vivo under the experimental conditions described in the caption to this axis on parts (af) of this figure according to article [125]. Solid and dash-and-dot lines denote linear regression and boundaries of its 95% confidence interval, calculated by means of software package STATISTICA version 10.0 (StatsoftTM, Tulsa, OK, USA). Statistics: r, R, τ, γ and p are coefficients of Pearson’s linear correlation, Spearman’s rank correlation, Kendall’s rank correlation, Goodman–Kruskal generalized correlation, and their p-values (statistical significance), respectively.

Similar articles

References

    1. Virk P.S., Andersson M.S., Arcos J., Govindaraj M., Pfeiffer W.H. Transition from targeted breeding to mainstreaming of biofortification traits in crop improvement programs. Front. Plant Sci. 2021;12:703990. doi: 10.3389/fpls.2021.703990. - DOI - PMC - PubMed
    1. Zhang X., Cal A.J., Borevitz J.O. Genetic architecture of regulatory variation in Arabidopsis thaliana. Genome Res. 2011;21:725–733. doi: 10.1101/gr.115337.110. - DOI - PMC - PubMed
    1. Li Q., Sapkota M., van der Knaap E. Perspectives of CRISPR/Cas-mediated cis-engineering in horticulture: Unlocking the neglected potential for crop improvement. Hortic. Res. 2020;7:36. doi: 10.1038/s41438-020-0258-8. - DOI - PMC - PubMed
    1. Saeed S., Usman B., Shim S.H., Khan S.U., Nizamuddin S., Saeed S., Shoaib Y., Jeon J.S., Jung K.H. CRISPR/Cas-mediated editing of cis-regulatory elements for crop improvement. Plant Sci. 2022;324:111435. doi: 10.1016/j.plantsci.2022.111435. - DOI - PubMed
    1. Cui Z., Tian R., Huang Z., Jin Z., Li L., Liu J., Huang Z., Xie H., Liu D., Mo H., et al. FrCas9 is a CRISPR/Cas9 system with high editing efficiency and fidelity. Nat. Commun. 2022;13:1425. doi: 10.1038/s41467-022-29089-8. - DOI - PMC - PubMed

LinkOut - more resources