Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2009 Oct;19(10):1836-42.
doi: 10.1101/gr.093955.109. Epub 2009 Jul 21.

Quantitative phenotyping via deep barcode sequencing

Affiliations
Comparative Study

Quantitative phenotyping via deep barcode sequencing

Andrew M Smith et al. Genome Res. 2009 Oct.

Abstract

Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or "Bar-seq," outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that approximately 20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene-environment interactions on a genome-wide scale.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Comparison of barcode microarray hybridization and Bar-seq data on identical samples. A pool of 953 strains was created that contains four subpools of approximately 250 yeast deletion strains each. The strains in this pool were selected to contain two well-characterized drug targets and an additional 951 control heterozygote strains. These were mixed together in a constant pool (Pool-constant) at a ratio of 1:1:1:1 and in a variable pool (Pool-variable) at a ratio of 0.25:0.5:1.0:2.0. Log2 signals for each strain were determined, and the relative abundance across subpools was assessed. For tag-array analysis, the signal refers to the raw intensities corrected for saturation effects as described previously (Pierce et al. 2007), whereas for sequencing analysis, the signal refers to the sequencing counts. Data were filtered to remove strains with signal below an arbitrary background level (signal of 40 for sequencing data, 200 for hybridization data). (A) Scatterplot of the log2 ratio of the signal for each strain in the variable pool (0.25:0.5:1.0:2.0) over the signal in the constant pool (1:1:1:1). The subpools are shown in different colors: red, green, blue, and yellow correspond to ratios within Pool-variable of 0.25:0.5:1.0:2.0, respectively. The red, green, blue, and yellow lines indicate the expected log2 ratios. The data for this panel were scale-normalized using the green group, which is at equal concentration in both pools. (B) The distribution of the log2 ratios between variable (0.25:0.5:1.0:2.0) and constant (1:1:1:1) pools is shown for each subpool. The mean of each distribution is shown, with error bars representing one standard deviation. The y-axis is the log2 intensity or counts for each subpool present in the variable pool over the constant pool. The red numbers are the ratio of each subpool's mean over the mean of the 2 subgroup; in brackets is the expected ratio. All subgroups are statistically different in both the barcode microarray and Bar-seq data sets with P-values <10−6.
Figure 2.
Figure 2.
Results of the yeast deletion pools assayed by array and Bar-seq. Log2 results for both TAG4 barcode microarray hybridization and Illumina sequencing are presented. All axes represent log2 ratios of control over treatment vs. genes (alphabetically ordered). (A,B) Results for the downtags for the drug treatments of the constant pool for (A) cerivastatin and (B) tunicamycin. (C) Results for the heterozygote essential pool treated with doxorubicin. The r-value in the righthand column indicates the correlation of the log2 ratios between the array vs. sequencing data. (Arrow) Known drug targets are labeled. The sequencing data were collected using a single sequencing reaction for four independent samples (four-plex). The correlation data were filtered based on greater than 10 counts in the Bar-seq DMSO control and an intensity of more than 200 in the DMSO array control, prior to correlation calculation. These data were collected in four-plex sequencing reactions. For details, see Methods.
Figure 3.
Figure 3.
Schematic showing sequencing strategy for re-characterization of barcode and common priming sequences. (U1, U2/D1, D2) Common priming sites for uptag/downtag barcodes. (BC) Barcode. (Top panels) We used a paired-end sequencing reaction to identify both genomic position (from one read) and the barcodes and U1/D1 sequences (from the second read). (Bottom panels) In an additional sequencing reaction, we identified the barcodes and U2/D2 sequences in a single Illumina sequencing read by using a primer with homology with the KanMX4 cassette and flanking the U2/D2 sequences (shown in gray). (Colored circles) The bases that are being sequenced; (colored arrows) the primers used in the sequencing reaction; (square) the uptag barcode; (light-blue square) the downtag barcode. (Triangles flanking the colored boxes) The common primers; (dark blue triangle) the ligated adaptor sequence used to sequence the genomic DNA flanking the cassette.
Figure 4.
Figure 4.
Yeast knockout collection characterization. (Top) An illustration of the yeast deletion cassette; (bottom) the table represents the total number of barcodes found, the percent correct (i.e., sequences found to exactly match the designed sequence), and the percent incorrect (i.e., sequences found to deviate from the expected sequence). Also shown is a breakdown of the incorrect sequences that were identified. This breakdown includes the percentage of single substitutions, single deletions, single insertions, and other mutations (i.e., multiple deletions). These data were collected in two paired-end sequencing reactions and two single sequencing reactions. For details, see Supplemental Methods.

References

    1. Ben-Aroya S, Coombes C, Kwok T, O'Donnell KA, Boeke JD, Hieter P. Toward a comprehensive temperature-sensitive mutant repository of the essential genes of Saccharomyces cerevisiae. Mol Cell. 2008;30:248–258. - PMC - PubMed
    1. Bennett S. Solexa Ltd. Pharmacogenomics. 2004;5:433–438. - PubMed
    1. Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HR, et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008;456:53–59. - PMC - PubMed
    1. Craig DW, Pearson JV, Szelinger S, Sekar A, Redman M, Corneveaux JJ, Pawlowski TL, Laub T, Nunn G, Stephan DA, et al. Identification of genetic variants using barcoded multiplexed sequencing. Nat Methods. 2008;5:887–893. - PMC - PubMed
    1. Eason RG, Pourmand N, Tongprasit W, Herman ZS, Anthony K, Jejelowo O, Davis RW, Stolc V. Characterization of synthetic DNA bar codes in Saccharomyces cerevisiae gene-deletion strains. Proc Natl Acad Sci. 2004;101:11046–11051. - PMC - PubMed

Publication types

MeSH terms