Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 Oct;136(2):3023-33.
doi: 10.1104/pp.104.043323. Epub 2004 Aug 6.

Utility of different gene enrichment approaches toward identifying and sequencing the maize gene space

Affiliations

Utility of different gene enrichment approaches toward identifying and sequencing the maize gene space

Nathan Michael Springer et al. Plant Physiol. 2004 Oct.

Abstract

Maize (Zea mays) possesses a large, highly repetitive genome, and subsequently a number of reduced-representation sequencing approaches have been used to try and enrich for gene space while eluding difficulties associated with repetitive DNA. This article documents the ability of publicly available maize expressed sequence tag and Genome Survey Sequences (GSSs; many of which were isolated through the use of reduced representation techniques) to recognize and provide coverage of 78 maize full-length cDNAs (FLCs). All 78 FLCs in the dataset were identified by at least three GSSs, indicating that the majority of maize genes have been identified by at least one currently available GSS. Both methyl-filtration and high-Cot enrichment methods provided a 7- to 8-fold increase in gene discovery rates as compared to random sequencing. The available maize GSSs aligned to 75% of the FLC nucleotides used to perform searches, while the expressed sequence tag sequences aligned to 73% of the nucleotides. Our data suggest that at least approximately 95% of maize genes have been tagged by at least one GSS. While the GSSs are very effective for gene identification, relatively few (18%) of the FLCs are completely represented by GSSs. Analysis of the overlap of coverage and bias due to position within a gene suggest that RescueMu, methyl-filtration, and high-Cot methods are at least partially nonredundant.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Distribution of the number of GSSs corresponding to our 78 FLCs. This figure describes the distribution and numbers of genes hit by the different types of GSSs but, due to different numbers of reads for the different libraries, is not a measure of relative effectiveness. A, The total number of randomly sequenced GSSs and the breakdown between BAC end and small-insert random libraries is shown. The majority of FLCs had no alignments with BAC end or small-insert random library GSSs, and very few FLCs had multiple randomly sequenced GSSs. B, The number of RM GSSs aligning to the 78 FLCs in the dataset. The majority of FLCs were not identified by an RM GSS. However, there were several FLCs with a relatively high number of RM GSSs, which may reflect insertion site preferences for the Mutator transposable element. C, The number of GSSs per FLC for both the MF- and HC-selected libraries. D, The sorted values for the number of GSSs from the MF- and HC-selected libraries per kilobase of coding region used to perform the searches. There are slightly more MF hits (most likely due to the higher number of MF sequences deposited at GenBank). However, the relatively higher density of MF hits per kilobase of coding region for some FLCs may reflect a propensity for the MF to capture certain FLCs at a higher rate. E, The total number of GSSs per FLC. Every FLC had at least three GSSs with a maximum of 41 GSSs. F, The sorted values for the total number of GSSs per kilobase of coding region used to perform the searches.
Figure 2.
Figure 2.
Coverage of the 78 FLC sequences by EST and GSS sequencing. A, The percent of nucleotides within the FLC or genomic sequences used for the BLAST searches that are represented by each subset of sequences. The GSSs represent approximately 75% of the base pairs used to perform the FLC BLAST searches. The overlap between the MF and HC coverage is indicated in B. A total of 101,987 bp of FLC sequence (out of 135,510 bp) was covered. The overlap between the MF and HC GSS coverage is 36.9%. C and D, The distribution of positions that individual GSSs cover within those FLC sequences that they align to. The RM GSSs display a significant bias toward the 5′ end of FLC sequences. Interestingly, the MF GSSs (D) also show a bias toward the 5′ end of FLC sequences.
Figure 3.
Figure 3.
Utility of GSSs for promoter identification. A subset of 33 FLCs (indicated in Table II) was used to perform searches to identify GSSs that could provide 5′ UTR and promoter sequences. Twenty-seven of the 33 FLCs had at least one GSS that overlapped the ATG start codon. A, The length (in base pairs) of UTR/promoter sequence for each FLC is shown. The average length was 894 bp. The sequence 5′ of the ATG start codon was analyzed for the presence of a putative promoter using Softberry TSSP software. The sequences for which a promoter was predicted are indicated by the presence of an asterisk.

Similar articles

Cited by

  • Towards decoding the conifer giga-genome.
    Mackay J, Dean JF, Plomion C, Peterson DG, Cánovas FM, Pavy N, Ingvarsson PK, Savolainen O, Guevara MÁ, Fluch S, Vinceti B, Abarca D, Díaz-Sala C, Cervera MT. Mackay J, et al. Plant Mol Biol. 2012 Dec;80(6):555-69. doi: 10.1007/s11103-012-9961-7. Epub 2012 Sep 9. Plant Mol Biol. 2012. PMID: 22960864 Review.
  • Gene enrichment in maize with hypomethylated partial restriction (HMPR) libraries.
    Emberton J, Ma J, Yuan Y, SanMiguel P, Bennetzen JL. Emberton J, et al. Genome Res. 2005 Oct;15(10):1441-6. doi: 10.1101/gr.3362105. Genome Res. 2005. PMID: 16204197 Free PMC article.
  • Extension of Lander-Waterman theory for sequencing filtered DNA libraries.
    Wendl MC, Barbazuk WB. Wendl MC, et al. BMC Bioinformatics. 2005 Oct 10;6:245. doi: 10.1186/1471-2105-6-245. BMC Bioinformatics. 2005. PMID: 16216129 Free PMC article.
  • Uneven chromosome contraction and expansion in the maize genome.
    Bruggmann R, Bharti AK, Gundlach H, Lai J, Young S, Pontaroli AC, Wei F, Haberer G, Fuks G, Du C, Raymond C, Estep MC, Liu R, Bennetzen JL, Chan AP, Rabinowicz PD, Quackenbush J, Barbazuk WB, Wing RA, Birren B, Nusbaum C, Rounsley S, Mayer KF, Messing J. Bruggmann R, et al. Genome Res. 2006 Oct;16(10):1241-51. doi: 10.1101/gr.5338906. Epub 2006 Aug 10. Genome Res. 2006. PMID: 16902087 Free PMC article.
  • The TIGR Maize Database.
    Chan AP, Pertea G, Cheung F, Lee D, Zheng L, Whitelaw C, Pontaroli AC, SanMiguel P, Yuan Y, Bennetzen J, Barbazuk WB, Quackenbush J, Rabinowicz PD. Chan AP, et al. Nucleic Acids Res. 2006 Jan 1;34(Database issue):D771-6. doi: 10.1093/nar/gkj072. Nucleic Acids Res. 2006. PMID: 16381977 Free PMC article.

References

    1. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389–3402 - PMC - PubMed
    1. Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796–815 - PubMed
    1. Arumuganathan K, Earle ED (1991) Nuclear DNA content of some important plant species. Plant Mol Biol 42: 251–269
    1. Bennetzen JL (1996) The contributions of retroelements to plant genome organization, function and evolution. Trends Microbiol 4: 347–353 - PubMed
    1. Bennetzen JL, SanMiguel P, Chen M, Tikhonov A, Francki M, Avramova Z (1998) Grass genomes. Proc Natl Acad Sci USA 95: 1975–1978 - PMC - PubMed

Publication types