Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Nov 8:14:1279231.
doi: 10.3389/fpls.2023.1279231. eCollection 2023.

Mining the Utricularia gibba genome for insulator-like elements for genetic engineering

Affiliations

Mining the Utricularia gibba genome for insulator-like elements for genetic engineering

Daniel Laspisa et al. Front Plant Sci. .

Erratum in

Abstract

Introduction: Gene expression is often controlled via cis-regulatory elements (CREs) that modulate the production of transcripts. For multi-gene genetic engineering and synthetic biology, precise control of transcription is crucial, both to insulate the transgenes from unwanted native regulation and to prevent readthrough or cross-regulation of transgenes within a multi-gene cassette. To prevent this activity, insulator-like elements, more properly referred to as transcriptional blockers, could be inserted to separate the transgenes so that they are independently regulated. However, only a few validated insulator-like elements are available for plants, and they tend to be larger than ideal.

Methods: To identify additional potential insulator-like sequences, we conducted a genome-wide analysis of Utricularia gibba (humped bladderwort), one of the smallest known plant genomes, with genes that are naturally close together. The 10 best insulator-like candidates were evaluated in vivo for insulator-like activity.

Results: We identified a total of 4,656 intergenic regions with expression profiles suggesting insulator-like activity. Comparisons of these regions across 45 other plant species (representing Monocots, Asterids, and Rosids) show low levels of syntenic conservation of these regions. Genome-wide analysis of unmethylated regions (UMRs) indicates ~87% of the targeted regions are unmethylated; however, interpretation of this is complicated because U. gibba has remarkably low levels of methylation across the genome, so that large UMRs frequently extend over multiple genes and intergenic spaces. We also could not identify any conserved motifs among our selected intergenic regions or shared with existing insulator-like elements for plants. Despite this lack of conservation, however, testing of 10 selected intergenic regions for insulator-like activity found two elements on par with a previously published element (EXOB) while being significantly smaller.

Discussion: Given the small number of insulator-like elements currently available for plants, our results make a significant addition to available tools. The high hit rate (2 out of 10) also implies that more useful sequences are likely present in our selected intergenic regions; additional validation work will be required to identify which will be most useful for plant genetic engineering.

Keywords: Utricularia; bladderwort; cis-regulatory elements; insulator; transgenics.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The author(s) declared that they were an editorial board member of Frontiers, at the time of submission. This had no impact on the peer review process and the final decision.

Figures

Figure 1
Figure 1
Intergenic distances in assembled Utricularia gibba chromosomes 1-4. (A) The relative intergenic distances normalized to the maximum (20 kb) for pairs of annotated genes on chromosomes 1-4, and the distribution of convergent and divergent potential insulators across the chromosomes. Approximate pericentromeric regions are shaded in blue and were defined by the primary cluster of repeats in the 4 chromosomes ( Supplementary Figure 1 ). (B) Intergenic regions with lengths >1 kb were filtered from the dataset to focus on shorter elements (green shading) that could be more easily used in transgenic constructs. The resulting filtered dataset represents roughly 42.7% of all intergenic regions in U. gibba. The full count for regions >2000 nt is cut off for readability (marked by *), and is 192, 139, 112, and 87 for CHR1-CHR4, respectively (see Supplementary Table ST1 ).
Figure 2
Figure 2
Transcriptional profiles of potential insulators. U. gibba intergenic regions were classified as potential insulators based on the local orientation and expression levels of the flanking genes. Specifically, we targeted regions where the flanking genes were either (A) diverging or (B) converging and had expression levels differing by at least 1.5x. These regions also likely contain additional cis-regulatory elements (promoters, terminators, etc.) outside the scope of this study. Ultimately, these criteria were chosen to enrich for candidates with possible insulator function instead of exhaustive catalog of all sequences with any potential regulatory function.
Figure 3
Figure 3
Synteny of CREs across 45 physical maps (including U. gibba). (A) Taxonomic arrangement of the 45 species investigated for syntenic relationships with U. gibba, based on NCBI taxonomy. The 45 species fall into three primary groups: the Asterids (blue, including U. gibba), Rosids (purple), and Monocots (orange). (B) The number of total syntenic gene pairs with U. gibba is variable within each of the three groups, though the outgroup (Monocots) is generally less syntenic with U. gibba. (C) Circos (Krzywinski et al., 2009) synteny plots for one representative genome from each group (boxed in green in Panel A). U. gibba assembled chromosomes are shown in red and scaffolds >1 Mb in black, with chromosome 1 in each assembly marked with an asterisk. Chromosomes are displayed clockwise in ascending order. Synteny between U. gibba and other genomes is generally low due to the high level of fractionation previously described in U. gibba (Ibarra-Laclette et al., 2013).
Figure 4
Figure 4
Motifs associated with high confidence convergent and divergent intergenic regions (XTREME, Bailey et al., 2015). High confidence regions were those ≤1 kb in length, >10-fold expression difference between genes, and sharing the orientation and order of genes in at least one other genome. (A) Conserved motifs among 17 divergent, and 20 convergent high confidence intergenic regions displayed as a sequence logo, (B) positional distribution relative to the middle of the intergenic sequence, and normalized to the maximum, (C) Histogram displaying the number of query sequences containing the motif and instances per sequence. (D) The putative class of transcription factor or binding site associated with the motif.
Figure 5
Figure 5
Dual reporter assay to evaluate the potential insulator activity in Nicotiana benthamiana leaves. (A) Schematic representation of the dual reporter assay. The fluorescent reporters are mCherry driven by the 2x35S promoter and GFP under the soybean seed-specific oleosin (GmOle) promoter. (B) When GmOle : GFP is transfected into N. benthamiana leaves, there is no GFP fluorescence present. (C) But, when the two cassettes are linked together by a 21-bp spacer, the enhancer from the 2x35S promoter ectopically activates the GmOle : GFP construct in leaf tissue. (D) However ectopic GFP expression is significantly attenuated when EXOB, a sequence known to have insulator function (Singer et al., 2010) is used to link the two cassettes. (E, F) Similarly, the 2 bladderwort sequences identified (Ugi21 or Ugi22) also attenuate ectopic GFP expression.
Figure 6
Figure 6
Evaluation of putative insulator activity (A) Red and green represents expression of mCherry (driven by the 2x 35S promoter) and GFP (driven by either the constitutive GmUbi3 promoter or the seed-specific Glycine max oleosin (GmOle promoter). Columns with a gray shaded background represent the controls (expression cassettes containing only mCherry or GFP by themselves, dual-cassette with a 21-bp spacer sequence, Agrobacterium (strain LBA4404) with no binary plasmid, and infiltration buffer). Columns with a white background represent expression of mCherry and GFP cassettes linked together in one construct. Sequences being tested for insulator activity were placed between the fluorescent markers, and GFP expression is attenuated if there is an insulator effect. The first set of columns shows sequences previously verified to have an insulator effect [EXOB (Singer et al., 2010), Ugi3, and Ugi4 (Illa-Berenguer et al.)]. The remaining columns show potential insulators from this study, identified either by fold-expression change of adjacent gene pairs or through shared synteny with multiple genomes Notice that 21-bp spacer sequence between the two cassettes does not affect expression of GFP, showing that separating the two cassettes is not enough to stop GFP activation. Three biological replicates were performed for each construct. (B) The results from Figure 6A expressed in terms of the ratio of mCherry expression to GFP expression. Sequences showing a significantly lower ratio are those with insulator activity. Different letters above boxplots indicate significantly different groups as determined by one-way ANOVA followed by Tukey-HSD post-hoc test (α ≤ 0.05). The y-axis was broken between 2 and 100 to better show the higher values.
Figure 7
Figure 7
Insulator strength vs sequence length. The sequences tested for insulator-like activity are shown with size (bp) on the x-axis and fold-change decrease in fluorescence signal relative to control (dual-cassette with a 21-bp spacer sequence) on the y axis. Fold-change and syntenic insulator sequences are shown in blue and red, respectively. Previously reported sequences with insulator function from EXOB (Singer et al., 2010) and (Illa-Berenguer et al.) (Ugi3 and Ugi4) are in yellow. The largest, least effective sequences in this test system are in the upper right, while the smallest, most effective are in the lower left.
Figure 8
Figure 8
Motifs associated with validated functional and non-functional insulators (XTREME, Bailey et al., 2015). (A) Conserved motifs among 12 validated sequences with insulator-like activity in plants (top) or 15 U. gibba sequences which showed no insulator-like activity, displayed as a sequence logo, (B) positional distribution relative to the middle of the intergenic sequence, and normalized to the maximum, (C) Histogram displaying the number of query sequences containing the motif and instances per sequence. (D) The putative class of transcription factor or binding site associated with the motif. The C2C2-related 12-nt AAAAGGAABCAA motif is present in all 12 functional insulator sequences, though given the small sample size this should be interpreted with caution. It is also present in other sequences where we could not detect insulator function.

References

    1. Akasaka K., Nishimura A., Takata K., Mitsunaga K., Mibuka F., Ueda H., et al. . (1999). Upstream element of the sea urchin arylsulfatase gene serves as an insulator. Cell. Mol. Biol. Noisy–Gd. Fr. 45, 555–565. - PubMed
    1. Andrews S. (2015) FastQC. FastQC qual. Control tool high throughput seq. Data. Available at: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/.
    1. Arnold C. D., Gerlach D., Stelzer C., Boryń Ł.M., Rath M., Stark A. (2013). Genome-wide quantitative enhancer activity maps identified by STARR-seq. Science 339, 1074–1077. doi: 10.1126/science.1232542 - DOI - PubMed
    1. Bailey T. L., Johnson J., Grant C. E., Noble W. S. (2015). The meme suite. Nucleic Acids Res. 43, W39–W49. doi: 10.1093/nar/gkv416 - DOI - PMC - PubMed
    1. Benfey P. N., Ren L., Chua N. H. (1990). Tissue-specific expression from CaMV 35S enhancer subdomains in early stages of plant development. EMBO J. 9, 1677–1684. doi: 10.1002/j.1460-2075.1990.tb08291.x - DOI - PMC - PubMed