Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Mar 31;23(1):88.
doi: 10.1186/s13059-022-02660-8.

Prime-seq, efficient and powerful bulk RNA sequencing

Affiliations

Prime-seq, efficient and powerful bulk RNA sequencing

Aleksandar Janjic et al. Genome Biol. .

Abstract

Cost-efficient library generation by early barcoding has been central in propelling single-cell RNA sequencing. Here, we optimize and validate prime-seq, an early barcoding bulk RNA-seq method. We show that it performs equivalently to TruSeq, a standard bulk RNA-seq method, but is fourfold more cost-efficient due to almost 50-fold cheaper library costs. We also validate a direct RNA isolation step, show that intronic reads are derived from RNA, and compare cost-efficiencies of available protocols. We conclude that prime-seq is currently one of the best options to set up an early barcoding bulk RNA-seq protocol from which many labs would profit.

Keywords: Genomics; Power analysis; RNA-seq; Transcriptomics.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Graphical overview of prime-seq, highlighting its robustness, sensitivity, affordability, and the validation experiments performed. Cells are first lysed, mRNA is then isolated using magnetic beads, and in turn reverse transcribed into cDNA. Following cDNA synthesis, all samples are pooled, libraries are made, and the samples are sequenced. The protocol has been validated on 17 organisms, including human, mouse, zebrafish, and arabidopsis. Additionally, prime-seq is sensitive and works with low inputs, and the affordability of the method allows one to increase sample size to gain more biological insight. To verify prime-seq’s performance, we first compared prime-seq to TruSeq using the publicly available MAQC-III Study data. We then showed robust detection of marker genes in NPC differentiation and high-throughput analysis of AML-PDX patient samples without compromising the archived samples
Fig. 2
Fig. 2
Intronic reads account for a variable but substantial fraction of UMIs and stem from RNA. A Fraction of exonic and intronic UMIs from 97 primate and mouse experiments using various tissues (neural, cardiopulmonary, digestive, urinary, immune, cancer, induced pluripotent stem cells). Sequencing depth is indicated by shading of the individual bars. We observe an average of 21% intronic UMIs, with some level of tissue-specific deviations as, e.g., immune cells generally have higher fractions of intronic reads. B To determine if intronic reads stem from genomic DNA or mRNA, we extracted DNA from mouse embryonic stem cells (mESCs) and RNA from human-induced pluripotent stem cells (hiPSCs), pooled the two in various ratios (75, 50, 25, and 0% gDNA), and either treated the samples with DNase I (green) or left them untreated (gray). We then counted the percentage of genomic (=mouse-mapped) UMIs. This indicates that DNase I treatment in prime-seq is complete and that observed intronic reads are derived from RNA
Fig. 3
Fig. 3
Prime-seq has similar sensitivity and power compared to TruSeq (MAQC-III data). A Mapped reads, UMIs (dashed line, only prime-seq), and B detected genes (exonic + intronic reads) at varying sequencing depths between TruSeq data from the MAQC-III Study and matched prime-seq data, show prime-seq and TruSeq are similarly sensitive (filtering parameters: detected UMI ≥ 1, detected gene present in at least 25% of samples and is protein coding). C Accuracy, measured by spike-in molecules, is similarly high in both methods (R2 = 0.94). D The distribution of genes across mean expression is similar for both methods, as well as the dispersion, which follows a Poisson distribution (dark gray dashed line) for lower expressed genes and then increases as technical variation increases for highly expressed genes. The local polynomial regression fit between mean and dispersion estimates per method is shown in solid lines with 95% variability band per gene shown in dashed lines. E Power analysis at a sequencing depth of 10 million reads shows almost identical power between prime-seq and TruSeq, and a similar increase at varying sample size for F mean expression and G absolute log2 fold change. Data filtering parameters: detected UMI ≥ 1, detected gene present in at least 25% of samples
Fig. 4
Fig. 4
RNA extraction with beads, rather than columns, provides similar sequencing data while increasing throughput capabilities. A Feature distributions of RNA isolated with a column-based kit and magnetic beads show that both RNA extraction protocols produce similar amounts of useable reads from cultured human embryonic kidney 293T (HEK293T) cells, peripheral blood mononuclear cells (PBMC), and harvested mouse brain tissue. B Gene expression between both bead and column extraction are also similar in all three tested inputs (R2 = 0.86 HEK, 0.84 PBMCs, and 0.74 tissue). C Detected UMIs and detected genes for column and magnetic beads in HEK293T, PBMCs, and tissue are almost identical, with slightly more detected genes in the bead condition (filtering parameters: detected UMI ≥ 1, detected gene present in at least 25% of samples and is protein coding). Comparison of costs (D) and time (E) required for different RNA extractions
Fig. 5
Fig. 5
Two exemplary applications of prime-seq. A Experimental design for an acute myeloid leukemia (AML) study, where a biopsy punch was used to collect a small fraction of a frozen patient-derived xenograft (PDX)-AML sample. B Prime-seq libraries were generated from 94 PDX samples, derived from 11 different AML-PDX lines (color-coded) from 5 different AML subtypes (symbol-coded) and cluster primarily by AML subtype. C Experimental design for studying the differentiation from five human-induced pluripotent stem cell lines (iPSCs) to neural progenitor cells (NPC). D Expression levels from 20 a priori known marker genes cluster iPSCs and NPCs as expected
Fig. 6
Fig. 6
Prime-seq is very cost-efficient. A With a set budget of $500, prime-seq allows one to process 198 samples, which is 1.6 times more samples than the next cost-efficient method. B The compared methods were grouped into low, middle, and high cost methods and the TruSeq MAQCII data was used as a basis for power analysis for all methods but prime-seq. The increase in sample size due to cost efficiency directly impacts the power to detect differentially expressed genes, as evident by the increased performance of prime-seq and other low cost methods (BRB-seq and Decode-seq), even when sequencing costs are included in the comparison (sequencing depth of 10 mio. reads at a cost of $3.40 per 1 mio. reads)

References

    1. Stark R, Grzelak M, Hadfield J. RNA sequencing: the teenage years. Nat Rev Genet. 2019;20:631–656. - PubMed
    1. Ziegenhain C, Vieth B, Parekh S, Reinius B, Guillaumet-Adkins A, Smets M, et al. Comparative analysis of single-cell RNA sequencing methods. Mol Cell. 2017;65:631–43.e4. - PubMed
    1. Vieth B, Parekh S, Ziegenhain C, Enard W, Hellmann I. A systematic evaluation of single cell RNA-seq analysis pipelines. Nat Commun. 2019;10:4667. - PMC - PubMed
    1. Svensson V, Vento-Tormo R, Teichmann SA. Exponential scaling of single-cell RNA-seq in the past decade. Nat Protoc. 2018;13:599–604. - PubMed
    1. Mereu E, Lafzi A, Moutinho C, Ziegenhain C, McCarthy DJ, Álvarez-Varela A, et al. Benchmarking single-cell RNA-sequencing protocols for cell atlas projects. Nat Biotechnol. 2020;38:747–755. - PubMed

Publication types

LinkOut - more resources