Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Dec 19:9:621.
doi: 10.1186/1471-2164-9-621.

Methylation-sensitive linking libraries enhance gene-enriched sequencing of complex genomes and map DNA methylation domains

Affiliations

Methylation-sensitive linking libraries enhance gene-enriched sequencing of complex genomes and map DNA methylation domains

William Nelson et al. BMC Genomics. .

Abstract

Background: Many plant genomes are resistant to whole-genome assembly due to an abundance of repetitive sequence, leading to the development of gene-rich sequencing techniques. Two such techniques are hypomethylated partial restriction (HMPR) and methylation spanning linker libraries (MSLL). These libraries differ from other gene-rich datasets in having larger insert sizes, and the MSLL clones are designed to provide reads localized to "epigenetic boundaries" where methylation begins or ends.

Results: A large-scale study in maize generated 40,299 HMPR sequences and 80,723 MSLL sequences, including MSLL clones exceeding 100 kb. The paired end reads of MSLL and HMPR clones were shown to be effective in linking existing gene-rich sequences into scaffolds. In addition, it was shown that the MSLL clones can be used for anchoring these scaffolds to a BAC-based physical map. The MSLL end reads effectively identified epigenetic boundaries, as indicated by their preferential alignment to regions upstream and downstream from annotated genes. The ability to precisely map long stretches of fully methylated DNA sequence is a unique outcome of MSLL analysis, and was also shown to provide evidence for errors in gene identification. MSLL clones were observed to be significantly more repeat-rich in their interiors than in their end reads, confirming the correlation between methylation and retroelement content. Both MSLL and HMPR reads were found to be substantially gene-enriched, with the SalI MSLL libraries being the most highly enriched (31% align to an EST contig), while the HMPR clones exhibited exceptional depletion of repetitive DNA (to approximately 11%). These two techniques were compared with other gene-enrichment methods, and shown to be complementary.

Conclusion: MSLL technology provides an unparalleled approach for mapping the epigenetic status of repetitive blocks and for identifying sequences mis-identified as genes. Although the types and natures of epigenetic boundaries are barely understood at this time, MSLL technology flags both approximate boundaries and methylated genes that deserve additional investigation. MSLL and HMPR sequences provide a valuable resource for maize genome annotation, and are a uniquely valuable complement to any plant genome sequencing project. In order to make these results fully accessible to the community, a web display was developed that shows the alignment of MSLL, HMPR, and other gene-rich sequences to the BACs; this display is continually updated with the latest ESTs and BAC sequences.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Alignment of MSLL-Sal, MSLL-Hpa and HMPR sequences to 105 curated gene sequences, which included at least 1 kb of upstream and downstream intergenic sequence (see text). The horizontal scale in the upstream and downstream regions is basepairs; in the gene interior, it is fractional distance along the gene length. The vertical scale indicates the fraction of gene-aligning sequences which cover the gene region in question.
Figure 2
Figure 2
Starting point location of alignments of sequence types to gene regions. MSLL, HMPR, MF, HC, RM, and UF sequences were aligned to 151 curated gene sequences (see text). Bars indicate the percentage of the alignments for each sequence type for which the initial base was located in the indicated gene region (exon, intron, 5' intergenic, 3' intergenic). The "size" field shows the total amount of sequence of each type.
Figure 3
Figure 3
Example alignments of MSLL BACs with maize genomic assemblies. (a) routine observation, with the genes shown by arrows and the areas between the genome primarily comprised of LTR retrotransposons and a few other repeats. (b) unusual alignment, where MSLL ends flank apparently methylated genes. Predicted genes are shown by arrows, with the size and orientation indicating the predicted size and transcriptional orientation of the candidate gene. Each vertical lines indicates a gap in the sequence assembly, while the triangles indicate sites for the restriction enzyme SalI, which was used to generate the two BACs shown. The genomic sequence scaffolds depicted are from Bruggmann and coworkers [31].
Figure 4
Figure 4
The maize mini-BAC page [28]. The Name entries link to the Genbank record and the Contig entries link to the WebFPC contig display [48] to show the position of the clone. Clicking on a BAC icon displays a Genome Browser for the BAC, with additional tracks for RM, HC, and MF sequences.

References

    1. International Rice Genome Sequencing Project The map-based sequence of the rice genome. Nature. 2005;436:793–800. doi: 10.1038/nature03895. - DOI - PubMed
    1. Du C, Swigonova Z, Messing J. Retrotranspositions in orthologous regions of closely related grass species. BMC Evol Biol. 2006;6:62. doi: 10.1186/1471-2148-6-62. - DOI - PMC - PubMed
    1. Bennetzen JL, Coleman C, Liu R, Ma J, Ramakrishna W. Consistent over-estimation of gene number in complex plant genomes. Curr Opin Plant Biol. 2004;7:732–736. doi: 10.1016/j.pbi.2004.09.003. - DOI - PubMed
    1. Kikuchi S, Satoh K, Nagata T, Kawagashira N, Doi K, Kishimoto N, Yazaki J, Ishikawa M, Yamada H, Ooka H, Hotta I, Kojima K, Namiki T, Ohneda E, Yahagi W, Suzuki K, Li CJ, Ohtsuki K, Shishiki T, Otomo Y, Murakami K, Iida Y, Sugano S, Fujimura T, Suzuki Y, Tsunoda Y, Kurosaki T, Kodama T, Masuda H, Kobayashi M, Xie Q, Lu M, Narikawa R, Sugiyama A, Mizuno K, Yokomizo S, Niikura J, Ikeda R, Ishibiki J, Kawamata M, Yoshimura A, Miura J, Kusumegi T, Oka M, Ryu R, Ueda M, Matsubara K, Kawai J, Carninci P, Adachi J, Aizawa K, Arakawa T, Fukuda S, Hara A, Hashizume W, Hayatsu N, Imotani K, Ishii Y, Itoh M, Kagawa I, Kondo S, Konno H, Miyazaki A, Osato N, Ota Y, Saito R, Sasaki D, Sato K, Shibata K, Shinagawa A, Shiraki T, Yoshino M, Hayashizaki Y, Yasunishi A. Collection, mapping, and annotation of over 28,000 cDNA clones from japonica rice. Science. 2003;301:376–379. doi: 10.1126/science.1081288. - DOI - PubMed
    1. Seki M, Narusaka M, Kamiya A, Ishida J, Satou M, Sakurai T, Nakajima M, Enju A, Akiyama K, Oono Y, Muramatsu M, Hayashizaki Y, Kawai J, Carninci P, Itoh M, Ishii Y, Arakawa T, Shibata K, Shinagawa A, Shinozaki K. Functional annotation of a full-length Arabidopsis cDNA collection. Science. 2002;296:141–145. doi: 10.1126/science.1071006. - DOI - PubMed

Publication types

LinkOut - more resources