Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Jan;3(1):e13.
doi: 10.1371/journal.pbio.0030013. Epub 2005 Jan 4.

Sorghum genome sequencing by methylation filtration

Affiliations

Sorghum genome sequencing by methylation filtration

Joseph A Bedell et al. PLoS Biol. 2005 Jan.

Abstract

Sorghum bicolor is a close relative of maize and is a staple crop in Africa and much of the developing world because of its superior tolerance of arid growth conditions. We have generated sequence from the hypomethylated portion of the sorghum genome by applying methylation filtration (MF) technology. The evidence suggests that 96% of the genes have been sequence tagged, with an average coverage of 65% across their length. Remarkably, this level of gene discovery was accomplished after generating a raw coverage of less than 300 megabases of the 735-megabase genome. MF preferentially captures exons and introns, promoters, microRNAs, and simple sequence repeats, and minimizes interspersed repeats, thus providing a robust view of the functional parts of the genome. The sorghum MF sequence set is beneficial to research on sorghum and is also a powerful resource for comparative genomics among the grasses and across the entire plant kingdom. Thousands of hypothetical gene predictions in rice and Arabidopsis are supported by the sorghum dataset, and genomic similarities highlight evolutionarily conserved regions that will lead to a better understanding of rice and Arabidopsis.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Genome Reduction
MF reduces the sorghum genome by 66% in sampling a hypomethylated space of approximately 247 Mb (green) and filtering out 488 Mb (red) of the 735-Mb sorghum genome.
Figure 2
Figure 2. Gene Discovery Rate
Gene discovery rates for sorghum MF (blue), sorghum ESTs (pink), and an Arabidopsis simulation (dotted black) are shown. The gene discovery rates for the MF and ESTs were calculated based on matches to a set of 137 genes annotated on sorghum BAC clones versus the number of MF and EST reads. The Arabidopsis simulation was calculated based on the fold-coverage of chromosome 1, which contains 7,520 genes. The fold coverage was converted into read numbers as detailed in the Materials and Methods.
Figure 3
Figure 3. CG and CNG Suppression in MF versus UF Sequences
Sequences were analyzed for their mcrBC half-sites, those that overlap CG dinucleotides, and those that overlap CNG trinucleotides. The ratio of observed to expected sites is graphed for filtered (hatched) and unfiltered (white) for retrotransposons (A) and CDSs (B).
Figure 4
Figure 4. Methylation Status of tb2 and Kafirin Cluster
(A and B) Restriction maps of the tb2 gene (A) and the kafirin consensus sequences (B) are shown. The relevant restriction sites are indicated vertically and the numbers indicate the distances scale in basepairs. Each CDS is depicted as a blue-shaded arrow, and the region assayed is indicated by a black bar. The circles depict sites that are not present in every kafirin gene, and the color represents the number of genes that do not share the site. The orange circle (5′-most HhaI site) is a site conserved in nine of 11 kafirin genes, and the red circle (3′-most PstI site) is a site present in ten of the 11. (C) Results from a representative methylation analysis of tb2; the inset depicts the template dilution standard curve used to set the threshold for the experiment. Each experiment was performed three times with four on-board replicates per assay point. The results for each of the four differentially treated reactions are depicted with different colors. Red, mock-treated; blue, mcrBC-digested; orange, HhaI-digested; and green, HhaI + mcrBC double-digest. The inset shows the standard dilution control with two replicates at each dilution. The control was used to set the threshold for detection. The specificity of each reaction was confirmed using melt-curve analysis. (D) Results from a representative methylation analysis of the 11 kafirin genes. The results for each of the six differentially treated reactions are the same as in (C), with the following additional digests: pink, PstI-digested; light blue, PstI + mcrBC double-digest. Notice that the mcrBC with and without PstI yields the same Ct, while HhaI + mcrBC (green) yields a higher Ct on average; suggesting additional cleavage.
Figure 5
Figure 5. Phylogenetic Comparison of Sorghum DREB1 Genes
A phylogenetic tree comparing the AP2 domain of the sorghum DREB1 genes to those of Arabidopsis and rice was constructed using CLUSTALX [61]. The genes encoding proteins from Arabidopsis are DREB1A, DREB1B, and DREB1C. Rice genes are OsDREB1A, OsDREB1B, OsDREB1C (nucleotides 142,337–142,981), and OsDREB1D (nucleotides 1,489–2,250). AP2 domains from other Arabidopsis proteins are also included: APETALA2 (R2 domain), AtERF-1, LEAFY PETIOLE, and TINY.
Figure 6
Figure 6. Annotation of Arabidopsis by Sorghum MF Versus Rice Gene Sequences
Shown are the number of Arabidopsis proteins that are matched in a TBLASTN comparison to the sorghum MF set (blue) versus the rice gene sequences (yellow). The Arabidopsis proteins, after having known repetitive elements removed (see Materials and Methods), have been categorized as either hypothetical or known based on the definition line. Arabidopsis proteins were considered supported if they matched with an E-value less than or equal to 1 × 10−8. Sb, S. bicolor MF set; Osj:seq, Oryza sativa japonica gene sequences.
Figure 7
Figure 7. Secondary Structure of Predicted MiRNAs
Predicted hairpin secondary structure of miRNA MIR156a from rice and the newly discovered ortholog from sorghum. The 21-nucleotide MIR156a sequence is highlighted in red.

Similar articles

Cited by

References

    1. Board on Science and Technology for International Development, National Research Council. Lost crops of Africa. Washington, DC: National Academy Press; 1996. 386 pp.
    1. Bennett MD, Leitch IJ. Plant DNA C-values database (release 2.0, January 2003) 2003 http://www.rbgkew.org.uk/cval/homepage.html, accessed in September 2004.
    1. Bennetzen JL. The evolution of grass genome organisation and function. Symp Soc Exp Biol. 1998;51:123–126. - PubMed
    1. Bennetzen JL, Schrick K, Springer PS, Brown WE, SanMiguel P. Active maize genes are unmodified and flanked by diverse classes of modified, highly repetitive DNA. Genome. 1994;37:565–576. - PubMed
    1. SanMiguel P, Gaut BS, Tikhonov A, Nakajima Y, Bennetzen JL. The paleontology of intergene retrotransposons of maize. Nat Genet. 1998;20:43–45. - PubMed

Publication types

Associated data