Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Oct;151(2):483-95.
doi: 10.1104/pp.109.143370. Epub 2009 Aug 12.

Computational finishing of large sequence contigs reveals interspersed nested repeats and gene islands in the rf1-associated region of maize

Affiliations

Computational finishing of large sequence contigs reveals interspersed nested repeats and gene islands in the rf1-associated region of maize

Brent A Kronmiller et al. Plant Physiol. 2009 Oct.

Abstract

The architecture of grass genomes varies on multiple levels. Large long terminal repeat retrotransposon clusters occupy significant portions of the intergenic regions, and islands of protein-encoding genes are interspersed among the repeat clusters. Hence, advanced assembly techniques are required to obtain completely finished genomes as well as to investigate gene and transposable element distributions. To characterize the organization and distribution of repeat clusters and gene islands across large grass genomes, we present 961- and 594-kb contiguous sequence contigs associated with the rf1 (for restorer of fertility1) locus in the near-centromeric region of maize (Zea mays) chromosome 3. We present two methods for computational finishing of highly repetitive bacterial artificial chromosome clones that have proved successful to close all sequence gaps caused by transposable element insertions. Sixteen repeat clusters were observed, ranging in length from 23 to 155 kb. These repeat clusters are almost exclusively long terminal repeat retrotransposons, of which the paleontology of insertion varies throughout the cluster. Gene islands contain from one to four predicted genes, resulting in a gene density of one gene per 16 kb in gene islands and one gene per 111 kb over the entire sequenced region. The two sequence contigs, when compared with the rice (Oryza sativa) and sorghum (Sorghum bicolor) genomes, retain gene colinearity of 50% and 71%, respectively, and 70% and 100%, respectively, for high-confidence gene models. Collinear genes on single gene islands show that while most expansion of the maize genome has occurred in the repeat clusters, gene islands are not immune and have experienced growth in both intragene and intergene locations.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Combined genetic and physical map of maize sequence contigs. GBrowse display of the rf1 BAC contigs of maize chromosome 3 showing BAC path, predicted genes, and annotated TEs. rf1-C1 is 961 kb and contains 11 repeat clusters and eight predicted genes. rf1-C2 is 594 kb and contains five repeat clusters and six predicted genes. The two BAC contigs are separated by approximately 30 Mb. One gap remains, caused by dinucleotide and hexanucleotide polymer repeats; this is shown on BAC ZMMBBb0331I02, found at approximately 607 kb on rf1-C1.
Figure 2.
Figure 2.
Nested LTR retrotransposons cause sequence assembly gaps. Diagram of the commonly seen type of gap caused by nested LTR retrotransposons. A, Nested TE insertion view of the gap region. The blue TE (labeled 2) is found nested within the LTR of the green LTR retrotransposon (labeled 1). This can cause an assembly gap in one of three locations: at the insertion point of the blue TE on either the left or the right of the insertion, or on the other LTR of the green TE at the corresponding location of the insertion point. B, A sequence view of the three gap locations caused by insertion of the blue TE into the LTR of the green LTR retrotransposon. The blue TE has inserted into the left LTR of the green TE, and an assembly gap can be found on the left LTR to either the left or the right of the blue TE insertion. In either case, the sequence of the left LTR of the green TE has been split apart, and sequences belonging to the right LTR have incorrectly assembled at this split location (shown as the arrow pointing to the red sequence) and cause the gap assembly. The assembly gap can also occur on the right LTR of the green TE. Here, the join sequences between the left LTR and the blue TE, found on both sides of the blue TE insertion, can assemble incorrectly into the sequence of the right LTR and prevent the sequence from aligning. Successful closing of these types of gaps is crucial to characterization of maize nested repeat clusters.
Figure 3.
Figure 3.
TEnest graphical display of maize sequence contigs. A, TEnest insertion display output of the rf1-C1 961-kb maize contig, split into two sections. B, TEnest insertion display output of the rf1-C2 594-kb maize contig. TEs are shown as triangles inserted into the black DNA line. The TE families are shown below (for detailed display, see Supplemental Fig. S1).
Figure 4.
Figure 4.
Comparative analysis of maize sequence contigs with rice and sorghum. The two sequenced rf1-associated BAC contigs are shown in the center; predicted genes are shown as red rectangles on the black sequence contig lines, with gene identification numbers found in red above. Comparative sequence analysis with rice is shown at the top, and shared sequence regions between maize and rice are shown as green connecting lines. Comparative sequence analysis with sorghum is shown at the bottom, and shared sequence regions between maize and sorghum are shown as blue connecting lines. Collinear regions are seen between maize chromosome 3, rice chromosome 1, and sorghum chromosome 3. Seven out of 14 predicted genes are found in collinear order and in orientation between maize and rice. Ten out of 14 predicted genes are found in collinear order and in orientation between maize and sorghum; three of these genes are found duplicated in a second location on sorghum chromosome 3. One nonpredicted gene region at the left end of rf1-C1 aligns to collinear regions in both rice and sorghum. This is probably a maize pseudogene.

Similar articles

Cited by

References

    1. Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796–815 - PubMed
    1. Bennetzen JL, Chandler VL, Schnable P (2001) National Science Foundation-sponsored workshop report: maize genome sequencing project. Plant Physiol 127: 1572–1578 - PMC - PubMed
    1. Bennetzen JL, Ma J, Devos KM (2005) Mechanisms of recent genome size variation in flowering plants. Ann Bot (Lond) 95: 127–132 - PMC - PubMed
    1. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL (2006) GenBank. Nucleic Acids Res 34: D16–D20 - PMC - PubMed
    1. Bray N, Dubchak I, Pachter L (2003) AVID: a global alignment program. Genome Res 13: 97–102 - PMC - PubMed

Publication types

MeSH terms

Substances

Associated data

LinkOut - more resources