Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Apr;9(4):e1003010.
doi: 10.1371/journal.pcbi.1003010. Epub 2013 Apr 4.

Combinatorial pooling enables selective sequencing of the barley gene space

Affiliations

Combinatorial pooling enables selective sequencing of the barley gene space

Stefano Lonardi et al. PLoS Comput Biol. 2013 Apr.

Abstract

For the vast majority of species - including many economically or ecologically important organisms, progress in biological research is hampered due to the lack of a reference genome sequence. Despite recent advances in sequencing technologies, several factors still limit the availability of such a critical resource. At the same time, many research groups and international consortia have already produced BAC libraries and physical maps and now are in a position to proceed with the development of whole-genome sequences organized around a physical map anchored to a genetic map. We propose a BAC-by-BAC sequencing protocol that combines combinatorial pooling design and second-generation sequencing technology to efficiently approach denovo selective genome sequencing. We show that combinatorial pooling is a cost-effective and practical alternative to exhaustive DNA barcoding when preparing sequencing libraries for hundreds or thousands of DNA samples, such as in this case gene-bearing minimum-tiling-path BAC clones. The novelty of the protocol hinges on the computational ability to efficiently compare hundred millions of short reads and assign them to the correct BAC clones (deconvolution) so that the assembly can be carried out clone-by-clone. Experimental results on simulated data for the rice genome show that the deconvolution is very accurate, and the resulting BAC assemblies have high quality. Results on real data for a gene-rich subset of the barley genome confirm that the deconvolution is accurate and the BAC assemblies have good quality. While our method cannot provide the level of completeness that one would achieve with a comprehensive whole-genome sequencing project, we show that it is quite successful in reconstructing the gene sequences within BACs. In the case of plants such as barley, this level of sequence knowledge is sufficient to support critical end-point objectives such as map-based cloning and marker-assisted breeding.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Proposed sequencing protocol.
(A) obtain a BAC library for the target organism; (B) select gene-enriched BACs from the library (optional); (C) fingerprint BACs and build a physical map; (D) select a minimum tiling path (MTP) from the physical map; (E) pool the MTP BACs according to the shifted transversal design; (F) sequence the DNA in each pool, trim/clean sequenced reads; (G) assign reads to BACs (deconvolution); (H) assemble reads BAC-by-BAC using a short-read assembler.
Figure 2
Figure 2. An illustration of the three cases we are dealing with during the deconvolution process (clones belong to a MTP).
Figure 3
Figure 3. Count distribution for the signatures of all distinct 26-mers [(a) rice synthetic data, (c) barley HV5] and all the reads [(b) rice synthetic data, (d) barley HV5] in the 91 pools of sequencing data.
The x-axis represents the size of the signature and the y-axis is the absolute count.

References

    1. Kircher M, Sawyer S, Meyer M (2011) Double indexing overcomes inaccuracies in multiplex sequencing on the Illumina platform. Nucleic Acids Research 40: e3–e3. - PMC - PubMed
    1. Alon S, Vigneault F, Eminaga S, Christodoulou DC, Seidman JG, et al. (2011) Barcoding bias in high-throughput multiplex sequencing of miRNA. Genome Research 21: 1506–1511. - PMC - PubMed
    1. Craig DW, Pearson JV, Szelinger S, Sekar A, Redman M, et al. (2008) Identification of genetic variants using bar-coded multiplexed sequencing. Nature Methods 5: 887–93. - PMC - PubMed
    1. Cai WW, Chen R, Gibbs RA, Bradley A (2001) A clone-array pooled strategy for sequencing large genomes. Genome Research 11: 1619–1623. - PubMed
    1. Csuros M, Milosavljevic A (2002) Pooled genomic indexing (PGI): mathematical analysis and experiment design. In: Proceedings of Workshop on Algorithms in Bioinformatics. LNCS 2452: 10–28.

Publication types

Substances