Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Apr 29;20(1):85.
doi: 10.1186/s13059-019-1691-6.

Measuring sequencer size bias using REcount: a novel method for highly accurate Illumina sequencing-based quantification

Affiliations

Measuring sequencer size bias using REcount: a novel method for highly accurate Illumina sequencing-based quantification

Daryl M Gohl et al. Genome Biol. .

Abstract

Quantification of DNA sequence tags from engineered constructs such as plasmids, transposons, or other transgenes underlies many functional genomics measurements. Typically, such measurements rely on PCR followed by next-generation sequencing. However, PCR amplification can introduce significant quantitative error. We describe REcount, a novel PCR-free direct counting method. Comparing measurements of defined plasmid pools to droplet digital PCR data demonstrates that REcount is highly accurate and reproducible. We use REcount to provide new insights into clustering biases due to molecule length across different Illumina sequencers and illustrate the impacts on interpretation of next-generation sequencing data and the economics of data generation.

Keywords: ATAC-Seq; DNA library preparation; Genotyping by sequencing; Illumina; Next-generation sequencing; PCR-free; RAD-Seq; RNA-Seq; Size bias.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The REcount PCR-free quantification barcode technology described here is included in US patent application numbers 62/332,879, 62/630,463, and PCT/US17/31271. DMG is the CSO of CoreBiome, Inc. KBB is the COO of CoreBiome, Inc.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
REcount enables accurate and precise measurements of plasmid pools. a Design of REcount constructs. A barcode-containing, Illumina adapter-flanked construct is liberated with a restriction enzyme (MlyI) digest and directly sequenced. b Accuracy and reproducibility of REcount. c Analogous measurements of the same plasmid pool shown in panel b using varying PCR cycle numbers. d Root mean squared deviation from expected values (5% per construct) when the plasmid pool is measured using REcount, and varying cycles of PCR amplification of either the barcode construct (BC) or another variable sequence in these plasmids (V4). e Pearson correlation heatmap comparing REcount measurements with droplet digital PCR data and with conventional PCR amplification of either the BC or V4 amplicons
Fig. 2
Fig. 2
Multiplexing of REcount measurements using orthogonal restriction enzymes. a Plasmids containing REcount constructs flanked by orthogonal restriction enzyme cut sites. bf Total mapped reads identified for each construct type when the plasmid pool is digested with the indicated enzyme. gk Mapped reads identified for each construct when the plasmid pool is digested with the indicated enzyme
Fig. 3
Fig. 3
Illumina size standards allow measurement of sequencer-specific size biases. a Design of REcount-based Illumina size standard constructs. Each standard construct contains a normalization barcode, as well as a barcode associated with a variable size standard that can be liberated by MlyI digestion and directly sequenced. b Raw abundance data for all 30 size standards and normalization barcodes from a MiSeq run. c Run-to-run variability of multiple MiSeq runs (n = 6 flow cells). d Size bias profiles of the iSeq (n = 1 flow cell), MiSeq (n = 6 flow cells), NextSeq (n = 4 flow cells), and NovaSeq (n = 4 flow cells, 4 lanes) sequencers. Note: Size bias data for other Illumina instruments is shown in Additional file 1: Figure S5. e Size bias profiles of the same library either clustered on the MiSeq immediately after denaturation or clustered after freezing and thawing the denatured library. Error bars are ± s.e.m
Fig. 4
Fig. 4
Instrument-specific size biases have minimal effect on RNA-sequencing data. a Fragment size distributions for an RNA-Seq library sequenced on the NovaSeq and the NextSeq. b Correlation of expression values (FPKM) for this library across the two instruments
Fig. 5
Fig. 5
Instrument size biases affect genotyping marker observations in RAD-Seq data. a Average read counts for 11 RAD-Seq samples sequenced on the HiSeq or NextSeq. b Number of markers observed in filtered VCF file for the 11 RAD-Seq libraries. c Number of loci observed in filtered VCF file for the 11 RAD-Seq libraries. d Fraction of missing genotype calls for each sample in the unfiltered VCF file. e PCA plot generated using the unfiltered VCF file. f PCA plot using the filtered VCF file. HiSeq data points overlap with NextSeq data points in this plot
Fig. 6
Fig. 6
Effect of instrument size bias on ATAC-Seq data. a Average insert size for 6 ATAC-Seq libraries sequenced on the HiSeq or NextSeq. b Percentage of reads at a subsampled depth of 20 million reads per sample classified as non-, mono-, di-, and tri-nucleosomal. n = 6 libraries. ***denotes p < 0.01 using a t-test. n.s. denotes no significant difference. c Distribution of mapped reads at the Fgfr4 locus. IGV plots of mapped reads for each sample, subsampled to a depth of 20 million reads, and either directly mapped (“All reads”) or split into the non-nucleosomal (“Non-nucl.”) subset and mapped. MACS peak calls for PAX3-responsive sites for HiSeq (top) and NextSeq (bottom) are below each set of mapped reads

References

    1. Sims D, Mendes-Pereira AM, Frankum J, Burgess D, Cerone M-A, Lombardelli C, et al. High-throughput RNA interference screening using pooled shRNA libraries and next generation sequencing. Genome Biol. 2011;12:R104. doi: 10.1186/gb-2011-12-10-r104. - DOI - PMC - PubMed
    1. Rodriguez-Barrueco R, Marshall N, Silva JM. Pooled shRNA screenings: experimental approach. Methods Mol Biol. 2013;980:353–370. doi: 10.1007/978-1-62703-287-2_21. - DOI - PubMed
    1. Wang T, Wei JJ, Sabatini DM, Lander ES. Genetic screens in human cells using the CRISPR-Cas9 system. Science. 2014;343:80–84. doi: 10.1126/science.1246981. - DOI - PMC - PubMed
    1. Koike-Yusa H, Li Y, Tan E-P, Velasco-Herrera MDC, Yusa K. Genome-wide recessive genetic screening in mammalian cells with a lentiviral CRISPR-guide RNA library. Nat Biotechnol. 2014;32:267–273. doi: 10.1038/nbt.2800. - DOI - PubMed
    1. Shalem O, Sanjana NE, Hartenian E, Shi X, Scott DA, Mikkelsen TS, et al. Genome-scale CRISPR-Cas9 knockout screening in human cells. Science. 2014;343:84–87. doi: 10.1126/science.1247005. - DOI - PMC - PubMed

Publication types

Substances

LinkOut - more resources