. 2009 Jun;5(6):e1000516.

doi: 10.1371/journal.pgen.1000516. Epub 2009 Jun 12.

Change of gene structure and function by non-homologous end-joining, homologous recombination, and transposition of DNA

Wolfgang Goettel¹, Joachim Messing

Affiliations

PMID: 19521498
PMCID: PMC2686159
DOI: 10.1371/journal.pgen.1000516

Change of gene structure and function by non-homologous end-joining, homologous recombination, and transposition of DNA

Wolfgang Goettel et al. PLoS Genet. 2009 Jun.

. 2009 Jun;5(6):e1000516.

doi: 10.1371/journal.pgen.1000516. Epub 2009 Jun 12.

Authors

Wolfgang Goettel¹, Joachim Messing

Affiliation

¹ Waksman Institute of Microbiology, Rutgers University, Piscataway, NJ, USA.

PMID: 19521498
PMCID: PMC2686159
DOI: 10.1371/journal.pgen.1000516

Abstract

An important objective in genome research is to relate genome structure to gene function. Sequence comparisons among orthologous and paralogous genes and their allelic variants can reveal sequences of functional significance. Here, we describe a 379-kb region on chromosome 1 of maize that enables us to reconstruct chromosome breakage, transposition, non-homologous end-joining, and homologous recombination events. Such a high-density composition of various mechanisms in a small chromosomal interval exemplifies the evolution of gene regulation and allelic diversity in general. It also illustrates the evolutionary pace of changes in plants, where many of the above mechanisms are of somatic origin. In contrast to animals, somatic alterations can easily be transmitted through meiosis because the germline in plants is contiguous to somatic tissue, permitting the recovery of such chromosomal rearrangements. The analyzed region contains the P1-wr allele, a variant of the genetically well-defined p1 gene, which encodes a Myb-like transcriptional activator in maize. The P1-wr allele consists of eleven nearly perfect P1-wr 12-kb repeats that are arranged in a tandem head-to-tail array. Although a technical challenge to sequence such a structure by shotgun sequencing, we overcame this problem by subcloning each repeat and ordering them based on nucleotide variations. These polymorphisms were also critical for recombination and expression analysis in presence and absence of the trans-acting epigenetic factor Ufo1. Interestingly, chimeras of the p1 and p2 genes, p2/p1 and p1/p2, are framing the P1-wr cluster. Reconstruction of sequence amplification steps at the p locus showed the evolution from a single Myb-homolog to the multi-gene P1-wr cluster. It also demonstrates how non-homologous end-joining can create novel gene fusions. Comparisons to orthologous regions in sorghum and rice also indicate a greater instability of the maize genome, probably due to diploidization following allotetraploidization.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Figure 1. p1 alleles.**
p1 gives rise to phlobaphenes in female floral tissues (pericarp, cob, husks, and silk) and tassel glume margins of the male inflorescence. However, pigmentation is most obvious in pericarp (hence the name of the gene) and in glumes, palea and lemma of the cob. Pericarp or seed coat is the outermost layer of the kernel that is derived from the ovary wall and accordingly is maternal tissue. Glumes, palea and lemma are bracts enclosing the ovary and are also of maternal origin.

**Figure 2. *P1-wr* BACs bridge a FPC and sequence gap.**
The p cluster sequence of 379 kb is represented by a yellow rectangle. Each *P1-wr* repeat is illustrated by a red triangle pointing in the transcriptional orientation of the copy. Individual BACs are displayed as green and blue rectangles, and grey rectangles stand for fingerprinted contigs (FPCs). BACs shown in green were sequenced for this analysis, while BACs in blue were sequenced by a shotgun approach as part of the public maize-sequencing project. Due to the high similarity and large size of each *P1-wr* copy, gaps remain in the p cluster for the FPC map and publically available maize sequence as of November 2008. Our sequencing effort bridged the gaps and resolved the structural arrangement of all *P1-wr* repeats. BAC names, their accession numbers and sizes are given on top of each rectangle. Nucleotide positions written underneath the rectangles refer to the p cluster sequence that is covered by the BACs. Because BACs shown in blue are not fully assembled, their contig numbers are indicated in the rectangles. Therefore the BAC size corresponds to all added contigs. The calculated BAC size, which is written under the rectangle when available, can be smaller or larger dependent on overlaps or sequence gaps. Information whether BACs shown in green were fingerprinted and assembled in an FPC map is given underneath the rectangles. The vertical lines in two BACs represent a sequence gap in a CACTA and retro element.

**Figure 3. Representation of the *P1-wr* cluster and flanking sequences.**
*P1-wr* repeats, as well as flanking p genes are depicted as red pentagons with the apex pointing in the direction of transcription. Two predicted genes (pink pentagons) that encode a calmodulin binding protein (g1) and an expressed protein (g2) are positioned downstream of *p1/p2*. Regions containing probable pseudogenes are illustrated as pink hexagons. The fragmented genes downstream of the predicted genes are associated with a *Helitron* 3′ terminal sequence. Class I and class II transposable elements (drawn as rectangles and rounded rectangles, respectively) include mostly nested LTR retrotransposons, two CACTA elements (*misfit* and *doppia*), one *hAT* element, one LINE element and several MITEs (not shown). LTRs of retrotransposons are represented as triangles indicating the transcriptional orientation. Notice that the 3′ end of *p1/p2* is separated from the coding region by a large retroelement block. Transposons depicted in white are not well conserved. *P1-wr* repeats are displayed in transcriptional orientation from left to right, while *p2/p1* is proximal and *p1/p2* is distal to the centromere.

**Figure 4. Schematic alignment of p genes.**
While *P1-wr* is the only described p1 allele with a multi-copy structure, only one copy is shown here. The *P1-wr* 5′ region aligns well with other p1 alleles. Regulatory elements, *i.e.* distal and proximal enhancer and basal promoter, depicted as blue arrows, were only determined for *P1-rr*. In other p genes or alleles, the arrows merely refer to sequence homology to *P1-rr*. Functional homology has not been investigated. A *Heartbreaker* MITE (purple bar) and a Mu-like element (tan bar) are part of the proximal enhancer. The p2 sequences upstream of the transcription start site depicted as blue rectangles are nearly identical. Notice that p2 shares the initial promoter sequences (orange rectangle) with p1 alleles. Upstream of p2, maize and teosinte differ in their composition of retrotransposons (not shown). The transcribed component of p1 and p2 genes (with the exception of the *P1-rr* allele) consists of 3 exons (illustrated in red) and 2 introns. The fourth exon of *P1-rr* is not displayed. *P1-rr* differs from *P1-wr* in the 3′UTR. The p2 genes from maize (dotted) and parviglumis (horizontal lines) are very similar to p1 alleles in their transcribed regions. All other sequences shown are hybrids between p1 and p2. The 5′ region of *p2/p1* containing exon 1 and 2 (horizontal red lines) is of p2 origin while the 3′ end including exon 3 (full red) is derived from *P1-wr*. *p1/p2* switches from a p1 to a p2 sequence in exon 2. The 3′ UTR of *p1/p2* was separated by retrotransposon insertions as indicated by two parallel lines. Intron 2 comprises numerous transposable elements of various kinds: a *hAT*-like element (light green) and several MITEs or repeat elements ((S) *Stowaway*, (H) *Heartbreaker*, (P) *Pilgrim*, unnamed MITE).

**Figure 5. Polymorphisms among *P1-wr* repeats.**
This plot displays polymorphisms in individual *P1-wr* repeats compared to a *P1-wr* consensus sequence. Polymorphisms are subdivided in separate classes: SNPs are depicted in red, insertions in yellow, and deletions in green. Exons are shaded in red, a putative basal promoter region in green, and potential enhancer sequences in blue. Notice that exon 1 is split by a *hAT*-like transposable element in *P1-wr* repeat 6 and 11.

**Figure 6. Origin of the chimeric *p2/p1* gene by NHEJ.**
The recombinant *p2/p1* gene is represented as a rectangle above the filler DNA sequence shown in a yellow box. Exons 1 and 2 from *p2/p1* are similar to p2, but the third exon is derived from *P1-wr*. The complex filler DNA (yellow rectangle) originated from two nearby downstream sequences as indicated by two gray balloons. A potential ancestral sequence, consisting of a putative p2 gene (tan rectangle) and a neighboring *P1-wr* gene (gray rectangle), is shown on top. DNA double-strand break, deletion and repair events resulting in the *p2/p1* recombinant are explained in the main text.

**Figure 7. Retrotransposon insertions displaced the *p1/p2* 3′ UTR.**
The structural organization of the *p1/p2* gene (including exons and putative regulatory sequences) is shown at the bottom of this figure. The first retroelement that inserted approximately 1.38 million years ago (mya) in the 3′UTR of *p1/p2* was *Eninu*. *Shadowspawn* transposed “shortly” after (1.31 mya), in a region 4.1 kb downstream of *Eninu*. Ji jumped into *Eninu* 0.77 mya. Based on the nested nature of insertions, an *Opie* element that was truncated later must have inserted after *Eninu* but before *Huck*, which entered *Opie* 0.62 mya. Similarly, *Diguus*, now being a solo LTR, must have inserted into *Eninu* before *Zeon*, which was integrated into *Diguus* 0.58 mya. Finally, *Opie* and *Diguus* jumped into *Huck* 0.35 mya and 0.19 mya, respectively, pushing both ends of *p1/p2* to the total of 68 kb apart. The order of the transposition events can be inferred by the nested nature of insertions, which is consistent with all computed insertion dates.

**Figure 8. Synteny is maintained in maize, sorghum, and rice.**
The genomic comparison is centered around the maize p genes and its orthologs and includes five flanking genes. Chromosome segments of maize, rice and sorghum that contain p or orthologous genes are mostly collinear. However, a break in synteny occurred adjacent to the p orthologous gene on maize chr. 9. While rice has only one p ortholog, y and p genes in sorghum and maize chr. 1 are amplified. A gene encoding a conserved hypothetical protein in sorghum is duplicated as well. The expansion of the maize genome compared to rice and sorghum is most obvious even in this small genomic region. Genes are displayed as color-coded rectangles, and their transcriptional orientation is indicated by + and −. Undefined gene fragments are depicted in black. A dotted line indicates a sequence that consists of several contigs. Therefore, orientation of and distance between genes located in these contigs cannot be determined as of now. The gene orientation chosen in this figure is based on orthologous rice and sorghum genes. Lightning bolts stand for mutation events that generated pseudogenes.

See this image and copyright information in PMC

References

1. Aguilera A, Gomez-Gonzalez B. Genome instability: a mechanistic view of its causes and consequences. Nat Rev Genet. 2008;9:204–217. - PubMed
1. Messing J, Bennetzen J. Grass genome structure and evolution. Genome Dynamics. 2008;4:41–56. - PubMed
1. McClintock B. The significance of responses of the genome to challenge. Science. 1984;226:792–801. - PubMed
1. Puchta H. The repair of double-strand breaks in plants: mechanisms and consequences for genome evolution. J Exp Bot. 2005;56:1–14. - PubMed
1. Gorbunova V, Levy AA. How plants make ends meet: DNA double-strand break repair. Trends Plant Sci. 1999;4:263–269. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Change of gene structure and function by non-homologous end-joining, homologous recombination, and transposition of DNA

Affiliation

Change of gene structure and function by non-homologous end-joining, homologous recombination, and transposition of DNA

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources