Structure and architecture of the maize genome

Georg Haberer¹, Sarah Young, Arvind K Bharti, Heidrun Gundlach, Christina Raymond, Galina Fuks, Ed Butler, Rod A Wing, Steve Rounsley, Bruce Birren, Chad Nusbaum, Klaus F X Mayer, Joachim Messing

Affiliations

Affiliation

¹ Munich Information Center for Protein Sequences, Institute for Bioinformatics, Gesellschaft für Strahlenforschung Research Center for Environment and Health, D-85764 Neuherberg, Germany.

PMID: 16339807
PMCID: PMC1310546
DOI: 10.1104/pp.105.068718

Structure and architecture of the maize genome

Georg Haberer et al. Plant Physiol. 2005 Dec.

. 2005 Dec;139(4):1612-24.

doi: 10.1104/pp.105.068718.

Authors

Georg Haberer¹, Sarah Young, Arvind K Bharti, Heidrun Gundlach, Christina Raymond, Galina Fuks, Ed Butler, Rod A Wing, Steve Rounsley, Bruce Birren, Chad Nusbaum, Klaus F X Mayer, Joachim Messing

Affiliation

¹ Munich Information Center for Protein Sequences, Institute for Bioinformatics, Gesellschaft für Strahlenforschung Research Center for Environment and Health, D-85764 Neuherberg, Germany.

PMID: 16339807
PMCID: PMC1310546
DOI: 10.1104/pp.105.068718

Abstract

Maize (Zea mays or corn) plays many varied and important roles in society. It is not only an important experimental model plant, but also a major livestock feed crop and a significant source of industrial products such as sweeteners and ethanol. In this study we report the systematic analysis of contiguous sequences of the maize genome. We selected 100 random regions averaging 144 kb in size, representing about 0.6% of the genome, and generated a high-quality dataset for sequence analysis. This sampling contains 330 annotated genes, 91% of which are supported by expressed sequence tag data from maize and other cereal species. Genes averaged 4 kb in size with five exons, although the largest was over 59 kb with 31 exons. Gene density varied over a wide range from 0.5 to 10.7 genes per 100 kb and genes did not appear to cluster significantly. The total repetitive element content we observed (66%) was slightly higher than previous whole-genome estimates (58%-63%) and consisted almost exclusively of retroelements. The vast majority of genes can be aligned to at least one sequence read derived from gene-enrichment procedures, but only about 30% are fully covered. Our results indicate that much of the increase in genome size of maize relative to rice (Oryza sativa) and Arabidopsis (Arabidopsis thaliana) is attributable to an increase in number of both repetitive elements and genes.

PubMed Disclaimer

Figures

**Figure 1.**
Gene characteristics in the 100 random regions. Graphs have been plotted to show the number of exons per gene (A), the number of genes per BAC clone (B), and the gene density expressed as number of genes per 100 kb (C).

**Figure 2.**
Comparison of genes to species-specific EST collections. Proteins derived from the gene models were compared to the EST assemblies using TBLASTN. Homologous sequences were binned into four classes: gene models with highly significant EST matches (E value less than 1e−30), with significant homologies (E value between 10⁻³⁰ and 10⁻²⁰), with weak homologies (E values between 10⁻²⁰ and 10⁻¹⁰), and those exhibiting no or only very weak homologies (E values higher than 10⁻¹⁰).

**Figure 3.**
Graphic representation of a sample of annotated BAC clones. Ten out of 100 annotated BAC clones are arranged as bars depicting genes (blue) and regions containing repeat sequences (red). A straight gray line represents intergenic regions with no predicted gene models. To determine the coverage of our annotations by the collection of methyl- and C₀t-filtered sequence reads, we compared the BAC sequences against the respective collections obtained from TIGR (http://www.tigr.org/tdb/tgi/maize/). All filtered sequence reads were mapped to the 100 BAC sequences by BLASTN sequence comparison and subsequent quality parsing. To anchor a clone to a genomic location, a minimal sequence identity of 98% over the complete alignment length and an alignment length equal or greater than 90% of the clone length were required. Sequence matches from methyl/C₀t-filtered sequence reads are depicted in dark green and light green, respectively. Specific features are highlighted with consecutive numbers: (1) examples of low gene coverage by filtered sequences, (2) example of tandem gene copies representing highly similar hydrolases for which GSS tags could be unequivocally mapped, and (3) nonrepeat intergenic region well covered by filtered sequences.

**Figure 4.**
Three examples of genes containing repetitive sequences within their introns. CG:temp172:AC145728.7 represents an ATPase II-like protein, CG:temp394:AC147505.5 an unknown protein containing a conserved PER1 domain, and CG:temp390:AC147505.5 a protein containing two cyclin K domains. Exons are shown as striped bars, introns as black lines, and repetitive sequences as triangles. DNA transposons are represented by black and retroelements by gray triangles. CG:temp172:AC145728.7 contains a retroelement, CG:temp394:AC147505.5 three DNA transposons (tourist-, Castaway-, and MITE-*adh*-like elements) and one retroelement, and CG:temp390:AC147505.5 two copies of Ty/*copia* elements and SINEs, respectively, within their introns.

**Figure 5.**
Coverage of exonic, intronic, and genic sequences by methyl- and C₀t-filtered sequence reads. Coverage was determined as described in Figure 3, and results were sorted into bins of size 10% of fractional coverage. Fractional coverage for exons, introns, and complete genes by methyl-filtered, C₀t-filtered, and combined sequences are shown. A to C depict the values obtained for exonic, intronic, and genic coverage. Bars in medium blue show values obtained for methyl-filtered sequence reads, bars in light blue values for C₀t-filtered clones, and dark blue bars depict cumulative values.

See this image and copyright information in PMC

References

1. Ahn S, Tanksley SD (1993) Comparative linkage maps of the rice and maize genomes. Proc Natl Acad Sci USA 90: 7980–7984 - PMC - PubMed
1. Alleman M, Doctor J (2000) Genomic imprinting in plants: observations and evolutionary implications. Plant Mol Biol 43: 147–161 - PubMed
1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215: 403–410 - PubMed
1. Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796–815 - PubMed
1. Batzoglou S, Jaffe DB, Stanley K, Butler J, Gnerre S, Mauceli E, Berger B, Mesirov JP, Lander ES (2002) ARACHNE: a whole-genome shotgun assembler. Genome Res 12: 177–189 - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources
- PubMed Central
- Silverchair Information Systems
Other Literature Sources
- H1 Connect - Access expert opinions and insights on biomedical research.
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Structure and architecture of the maize genome

Affiliation

Structure and architecture of the maize genome

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources