Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jan 10;4(1):11-8.
doi: 10.1534/g3.113.008565.

Design and analysis of Bar-seq experiments

Affiliations

Design and analysis of Bar-seq experiments

David G Robinson et al. G3 (Bethesda). .

Abstract

High-throughput quantitative DNA sequencing enables the parallel phenotyping of pools of thousands of mutants. However, the appropriate analytical methods and experimental design that maximize the efficiency of these methods while maintaining statistical power are currently unknown. Here, we have used Bar-seq analysis of the Saccharomyces cerevisiae yeast deletion library to systematically test the effect of experimental design parameters and sequence read depth on experimental results. We present computational methods that efficiently and accurately estimate effect sizes and their statistical significance by adapting existing methods for RNA-seq analysis. Using simulated variation of experimental designs, we found that biological replicates are critical for statistical analysis of Bar-seq data, whereas technical replicates are of less value. By subsampling sequence reads, we found that when using four-fold biological replication, 6 million reads per condition achieved 96% power to detect a two-fold change (or more) at a 5% false discovery rate. Our guidelines for experimental design and computational analysis enables the study of the yeast deletion collection in up to 30 different conditions in a single sequencing lane. These findings are relevant to a variety of pooled genetic screening methods that use high-throughput quantitative DNA sequencing, including Tn-seq.

Keywords: Bar-seq; Sacchromyces cerevisiae; functional genomics; galactose; yeast.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Experimental design and results. (A) Our experimental design entailed two treatments (twenty-four hours of growth in glucose/YPD or galactose/YPGal), four biological replicates, and two technical replicates, along with four samples at time point 0 [not shown in (A)]. (B) Heat map of the Spearman correlation matrix of mutant counts by sample. Samples cluster according to time point and also by treatment (YPD vs. YPGal) and biological replicate.
Figure 2
Figure 2
Bar-seq quantifies mutant effects across a range of sequence read depths. (A) Volcano plot showing the relationship between the p-value (log-scale) and log-fold change. Genes known to be involved in activation or repression of the galactose utilization pathway are highlighted. The p-value of the rightmost red point is computationally indistinguishable from 0. (B) Plot of reads per mutant in the entire experiment compared with the estimated fold change after treatment.
Figure 3
Figure 3
Simulation analysis of variation in experimental design. The effect of read depth on (A) the number of mutants found significant at FDR = 5%, (B) the mean squared error between the estimate of the log-fold change and the value for the full experiment, (C) the number of significant GO terms identified using a Wilcoxon rank-sum test at FDR = 5%, and (D) the percentage of significant genes that were not found to be significant in the full experiment. Curves are shown for the full experiment, 2 treatments × 4 biological replicates × 2 technical replicates, as well as for subsampled 2 × 3 × 2, 2 × 2 × 2 experimental designs (solid lines). Subsamplings were also performed to simulate each experimental design using a single technical replicate (dashed lines). Each curve was smoothed using a natural cubic spline.
Figure 4
Figure 4
Statistical power varies with effect size and sequence read depth. The effect of read depth on the proportion of genes identified as significant (FDR = 5%) at different fold -change thresholds using 4 biological replicates × 1 technical replicate for each condition. The fold change for each mutant determined from the full 4 biological replicate × 2 technical replicate experiment is defined as the gold standard. The solid curve shows the proportion of genes found significant relative to the total experiment, whereas the dotted and dashed curves show the proportion of mutants that had at least a 1.5-fold or a two-fold change, respectively. The horizontal dashed line indicates the 90% power level.

References

    1. Amberg D. C., Burke D., Strathern J. N., 2005. Methods in Yeast Genetics. A Cold Spring Harbor Laboratory Course Manual, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.
    1. Anders S., Huber W., 2010. Differential expression analysis for sequence count data. Genome Biol. 11: R106. - PMC - PubMed
    1. Brutinel E. D., Gralnick J. A., 2012. Anomalies of the anaerobic tricarboxylic acid cycle in Shewanella oneidensis revealed by Tn-seq. Mol. Microbiol. 86: 273–283. - PubMed
    1. Carette J. E., Guimaraes C. P., Wuethrich I., Blomen V. A., Varadarajan M., et al. , 2011. Global gene disruption in human cells to assign genes to phenotypes by deep sequencing. Nat. Biotechnol. 29: 542–546. - PMC - PubMed
    1. Chen L., Storey J., 2008. Eigen-R2 for dissecting variation in high-dimensional studies. Bioinformatics. 24: 2260–2262. - PMC - PubMed

Publication types

LinkOut - more resources