Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 May 24:8:132.
doi: 10.1186/1471-2164-8-132.

Global repeat discovery and estimation of genomic copy number in a large, complex genome using a high-throughput 454 sequence survey

Affiliations

Global repeat discovery and estimation of genomic copy number in a large, complex genome using a high-throughput 454 sequence survey

Kankshita Swaminathan et al. BMC Genomics. .

Abstract

Background: Extensive computational and database tools are available to mine genomic and genetic databases for model organisms, but little genomic data is available for many species of ecological or agricultural significance, especially those with large genomes. Genome surveys using conventional sequencing techniques are powerful, particularly for detecting sequences present in many copies per genome. However these methods are time-consuming and have potential drawbacks. High throughput 454 sequencing provides an alternative method by which much information can be gained quickly and cheaply from high-coverage surveys of genomic DNA.

Results: We sequenced 78 million base-pairs of randomly sheared soybean DNA which passed our quality criteria. Computational analysis of the survey sequences provided global information on the abundant repetitive sequences in soybean. The sequence was used to determine the copy number across regions of large genomic clones or contigs and discover higher-order structures within satellite repeats. We have created an annotated, online database of sequences present in multiple copies in the soybean genome. The low bias of pyrosequencing against repeat sequences is demonstrated by the overall composition of the survey data, which matches well with past estimates of repetitive DNA content obtained by DNA re-association kinetics (Cot analysis).

Conclusion: This approach provides a potential aid to conventional or shotgun genome assembly, by allowing rapid assessment of copy number in any clone or clone-end sequence. In addition, we show that partial sequencing can provide access to partial protein-coding sequences.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Comparison of sequence survey data with soybean and other plant repeat databases. A) Distribution of hits to plant repeat databases, by genus. Raw reads were matched using BLAST (blastn) to the TIGR plant repeat databases and the top significant (1E-6) hit recorded. Percentages represent the percentage of reads with hits to sequences from a particular organism with respect to all reads with hits to the TIGR repeats. B) Distribution of hits to plant repeat databases, by class of repetitive element Raw reads from the genomic sequence survey were matched to the combined plant repeat databases as for (A), and the class of repetitive element for the top hit was used to show the relative abundance of different classes of repetitive elements. This gives an estimate of the relative frequency of these families in the soybean genome. Retrotransposons and rDNA are the most common classes of repeat. See Additional File 1 for common repeat sequences not included in the TIGR database.
Figure 2
Figure 2
Alignment of sequence survey reads to BAC clones. The figure shows a graphic of the alignment of survey reads using BLASTZ to three genomic Bacterial Artificial Chromosome (BAC) sequences of soybean DNA, and estimation of copy number. Copy number was estimated according to the number of sequence survey reads aligning to each 1 kb window of the BACs. The alignment represents the superposition of identical or closely related sequences on the BAC sequence, in order to visualize the individual reads showing regions present in many copies per genome. The BAC sequences were: A) The euchromatic BAC described by Clough et al.(20); B) the euchromatic BAC GM_WBb0098N11; C) the BAC GM_WBb0078A23 from a heterochromatic region
Figure 3
Figure 3
Annotation of protein ORFs with hits to public database. A) Proportion of EST clones from the Glycine Max Gene Index (GMGI) matched by 454 reads at 95% and 100% sequence identity (using BLAST with e < 1E-6). The total number of sequences matching at 95% or higher identity is 37% of total EST clones. Note that few sequences match at 100% identity due to the error rate of the 454 pyrosequencing used for this study.B) Coding fragments discovered within the short reads (with e values to the GenBank protein (nr) database < 1E-6), and their closest protein-level sequence hit by taxonomy of the source organism of the database sequence.

References

    1. Sanger F, Coulson AR, Hong GF, Hill DF, Petersen GB. Nucleotide sequence of bacteriophage λ DNA. J Mol Biol. 1982;162:729–773. doi: 10.1016/0022-2836(82)90546-0. - DOI - PubMed
    1. Fleishmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage AR, Bult CJ, Tomb J-F, Dougherty BA, Merrick JM, McKenney K, Sutton G, FitzHugh W, Fields C, Gocyne JD, Scott J, Shirley R, Liu L-I, Glodek A, Kelley JM, Weidman JF, Phillips CA, Spriggs T, Hedblom E, Cotton MD, Utterback TR, Hanna MC, Nguyen DT, Saudek DM, Brandon RC, Fine LD, Fritchman JL, Fuhrmann JL, Geoghagen NSM, Gnehm CL, McDonald LA, Small KV, Fraser CM, Smith HO, Venter JC. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science. 1995;269:496–512. doi: 10.1126/science.7542800. - DOI - PubMed
    1. Venter JC, Smith HO, Hood L. A new strategy for genome sequencing. Nature. 1996;381:364–366. doi: 10.1038/381364a0. - DOI - PubMed
    1. http://www.ncbi.nlm.nih.gov/Taxonomy/txstat.cgi http://www.ncbi.nlm.nih.gov/Taxonomy/txstat.cgi
    1. Ronaghi M, Karamohamed S, Pettersson B, Uhlen M, Nyren P. Real-time DNA sequencing using detection of pyrophosphate release. Anal Biochem. 1996;242:84–89. doi: 10.1006/abio.1996.0432. - DOI - PubMed

Publication types

LinkOut - more resources