Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 Aug;2(8):e498.
doi: 10.1002/cpz1.498.

Practical Considerations for Single-Cell Genomics

Affiliations
Review

Practical Considerations for Single-Cell Genomics

Claire Regan et al. Curr Protoc. 2022 Aug.

Abstract

The single-cell revolution in the field of genomics is in full bloom, with clever new molecular biology tricks appearing regularly that allow researchers to explore new modalities or scale up their projects to millions of cells and beyond. Techniques abound to measure RNA expression, DNA alterations, protein abundance, chromatin accessibility, and more, all with single-cell resolution and often in combination. Despite such a rapidly changing technology landscape, there are several fundamental principles that are applicable to the majority of experimental workflows to help users avoid pitfalls and exploit the advantages of the chosen platform. In this overview article, we describe a variety of popular single-cell genomics technologies and address some common questions pertaining to study design, sample preparation, quality control, and sequencing strategy. As the majority of relevant publications currently revolve around single-cell RNA-seq, we will prioritize this genomics modality in our discussion. © 2022 Wiley Periodicals LLC.

Keywords: genomics; scATAC-seq; scRNA-seq; sequencing; single-cell; transcriptomics.

PubMed Disclaimer

Conflict of interest statement

CONFLICT OF INTEREST STATEMENT:

The authors declare no conflicts of interest.

Figures

Figure 1.
Figure 1.. Summary of common single-cell technology platforms.
(A) Plate-based methods involve sorting or manually depositing cells into wells of a standard microplate, which are then processed as individual libraries by hand or with automation. (B) Droplet microfluidic devices such as the 10X Genomics Chromium Controller partition cells into emulsion droplets along with gel beads containing barcoded primers. Enzymatic barcoding by reverse transcription or ligation occurs in the emulsion, and subsequent library steps can be performed as a single pool. (C) Micro- and nanowell approaches allow a dilute cell suspension to settle into picoliter-sized wells along with oligo-conjugated beads under conditions that favor one bead and cell per well. (D) Combinatorial split/pool methods generally start with fixed and permeabilized cells that are distributed across a starting plate. A well-specific DNA barcode is appended, and all cells are then pooled to allow uniform mixing before re-distributing to a new plate, where a second barcode is added serially.
Figure 2.
Figure 2.. Multiplexing strategies.
For large-scale studies involving multiple individuals or conditions, samples can be multiplexed using methods such as “Cell Hashing” or “Genetic Demultiplexing” to minimize costs and batch-related artifacts. (Top) In Cell Hashing, batches are labeled by coupling DNA-barcoded hashing reagents to the surface of cells. Examples of hashing reagents include antibodies targeting ubiquitous surface proteins that are chemically conjugated to a DNA oligonucleotide, or lipid- and/or cholesterol-modified oligos that can be embedded within the plasma membrane. Following batch labeling, samples can be pooled and co-captured in the same single cell reaction using, for example, the 10X Genomics platform. Batch barcodes (a.k.a. “hash tags”) are extracted and counted from the resulting sequencing library, and then used to demultiplex the sample downstream. (Bottom) Alternatively, genetically diverse samples such as human patients can be demultiplexed based on their unique single nucleotide variant (SNV) profile. Cells from different donors can be mixed, co-captured, and sequenced as a single sample. Donor-specific SNV profiles, if available, can be compared with the read-level data from each cell, and a probability score is assigned for each of the donors. These probability scores are used to assign cells to different donors, and also to identify and reject “doublet” barcodes that likely contain mRNA from two or more cells. If SNV profiles have not been previously generated, read-level variant calls can still be used to assign cells into different genotype bins, though these bins cannot be matched back to the actual identity of the donors.
Figure 3.
Figure 3.. Preservation strategies.
Cells or tissue can often be preserved prior to a single cell experiment to allow for samples to be acquired at different times or locations and then processed synchronously. Either fully dissociated cell suspensions or partially intact, minced tissue can be frozen in liquid nitrogen and stored for weeks or months prior to thawing. If minced tissue pieces were frozen initially, single cell suspensions can be prepared after thawing by standard dissociation protocols used for fresh tissue. Alternatively, cells can be chemically preserved with a variety of fixatives that have been demonstrated to be compatible with many single-cell workflows, such as paraformaldehyde (PFA), glyoxal, or methanol (MeOH). Protocols to reconstitute the fixed single cells prior to sequencing vary depending on the fixative and the chemistry of the single-cell application.
Figure 4.
Figure 4.. Overview of a generalized single-cell workflow: resource investment and key checkpoints.
Most single-cell experimental workflows can be subdivided into three distinct phases: sample prep, barcoding and library prep, and sequencing. Each phase entails a significant investment of resources and presents a critical quality control checkpoint that provides opportunities to abort and retry if the samples appear suboptimal. (A) The sample prep stage encompasses all aspects of study design, sample acquisition, storage, and processing upstream of the “genomics” portion of the workflow, and can comprise the majority of the time investment involved in a project. On the day of the experiment, samples are dissociated into single-cell suspensions, and potentially passed through a flow sorter or magnetic column to obtain an enriched population of interest. The cell suspensions should be assessed (e.g., by microscopy) for relevant parameters including purity, viability, cleanliness of the suspension, and clumping. If the suspension looks unsatisfactory or if there are too few intact cells, this is the best time to abort before large amounts of resources are committed in the downstream steps. (B) Barcoding and library prep involves a series of enzymatic reactions that take place in emulsion droplets, PCR plates, nanowells, or other type of isolated compartment. Depending on the library chemistry, this step can consume roughly half of the costs associated with the experiment. Libraries are generally amplified by PCR with the addition of barcoded adapters, and should be assessed at a second QC checkpoint by electrophoresis (e.g., using a Bioanalyzer). (C) Sequencing of the single-cell libraries also consumes a significant percentage of the overall budget. Depending on the application, libraries are sequenced using either short reads (for gene expression, ATAC, CNV, immune profiling, or other applications) on an Illumina instrument, or long reads (e.g., for isoform-resolved RNA-seq, immune profiling) using an Oxford Nanopore or PacBio instrument. For large projects, an optional “skim-sequencing” step can be added for quality control. A few million reads per library is often sufficient to tell whether barcoding proceeded properly, and can provide a crude estimate of captured cell numbers and predicted sample quality.
Figure 5.
Figure 5.. Cell suspension quality control.
A well-dissociated single cell suspension will be largely free of debris, and cells will have a smooth, round appearance (top left). Cells can be stained with AOPI to visualize live cells as green and dead cells as red (bottom left). A poorly dissociated suspension (middle) will leave many aggregated clumps of several cells, which is not ideal for single-cell methods. Preparations with excessive non-cellular debris (right) should be cleaned by gradient centrifugation, FACS, or some other method to avoid microfluidic clogs or cross-contamination due to material stuck to cell fragments or other non-cellular debris.
Figure 6.
Figure 6.. Library quality control.
Libraries should be screened using an electrophoresis instrument such as the Agilent Bioanalyzer prior to sequencing to verify that the DNA fragment size falls within the expected range of the protocol. (A) One common pitfall of scRNA-seq libraries is degraded cDNA from dead or lysed cells, visualized by a shift in the molecular weight towards 1,000 bp and below. (B) Abundant PCR adapter artifacts can also swamp out gene-body reads, resulting in poor quality libraries, as visualized by reduced estimated number of genes detected per cell. (C, right) Likewise, ATAC-seq libraries should be checked for the expected “nucleosome ladder” pattern. Under-tagmented libraries (C, left) can be size-selected to yield an acceptable profile for sequencing.
Figure 7.
Figure 7.. Sequencing saturation curve.
In any sequencing experiment, the yield of new, non-duplicated molecules follows an asymptotic saturation curve (left). With deeper and deeper sequencing, fewer unique molecules are observed, and the return on the investment drops accordingly (right). Saturation curves can be estimated from an initial round of low-yield sequencing, or by comparing with similar sample and library types, which can be used to guide the choice of final targeted depth.
Figure 8.
Figure 8.. Depth requirements.
Ideal sequencing depth is dependent upon the project’s goals. For example, clustering and cell type identification tasks in scRNA-seq are relatively robust to shallower sequencing, modeled here by in silico downsampling of a mouse pancreatic tumor dataset (Elyada et al., 2019) to different median numbers of UMIs per cell. Dimensionality reduction by UMAP (A) and clustering confidence (B) demonstrate the impact of sequencing depth on resolving cell types. Distinct clusters (A, colored points) become resolvable with only a few hundred UMIs per cell, with rarer cell types emerging as separate clusters only at higher depths. (B) Unsupervised clustering was run on the downsampled datasets, and inter-cluster silhouette score was calculated as a measure of confidence in cluster assignments. Higher sequencing depth returns only a modest improvement in unsupervised clustering confidence as median UMI counts increase beyond ~2,000/cell. (C) Marker gene detection scales roughly linearly over the full range of subsampled depths. Cell type labels were fixed, and differential expression was performed to detect marker genes across cell types at each subsampled depth. Cell types with high mRNA content, such as epithelial and fibroblast cells, yield more marker genes at low sequencing depth compared with other types (left panel). This phenomenon is typically the result of the fact that a greater proportion of the total UMIs in the dataset come from mRNA-rich cells. Nonetheless, marker gene detection as a function of depth is roughly linear for all cell types, as visualized by normalizing the trend to the maximum number of recovered markers for each type (right panel). This illustrates how sequencing more deeply can help recover more marker genes in cell types with low mRNA content or which comprise only a small proportion of the total library.
Figure 9.
Figure 9.. Read length requirements.
Single-cell modalities require different read lengths for optimal performance. For example, using the 10X Genomics scRNA-seq platform, longer reads return modestly higher mapping rates above a certain minimum threshold length of the gene body read (A, top). The example shown represents human peripheral blood mononuclear cells prepared using 10X Genomics Single Cell 3’ Gene Expression version 3 chemistry and sequenced on an Illumina NextSeq 500 with a 132 bp gene-body read (unpublished data). Reads were trimmed in silico and remapped to assess overall mappability as a function of gene-body read length. Similarly, commonly used read lengths for scATAC-seq are largely indistinguishable in mapping rate (A, bottom). Here, 10X Genomics Single Cell ATAC libraries were created from dissociated mouse pancreatic tumors and sequenced on an Illumina NextSeq 500 with symmetric paired-end reads, informatically trimmed to various lengths (unpublished data). (B) Required read lengths also differ by technology platform as a consequence of their barcode design. Single-cell protocols use a variety of library design strategies, resulting in different required read lengths to cover key library features. For instance, split/pool methods such as BAG-seq, SHARE-seq, and SPLiT-seq generally require 100 or more bases to cover all barcode regions, while droplet methods employ compact barcodes requiring fewer bases, leaving more sequencing reagents available to dedicate to the gene-body or other features in the library amplicon. Combined barcode and minimum mappable genomic read lengths help determine the most efficient kit size to select for sequencing.

References

    1. Adam M, Potter AS, & Potter SS (2017). Psychrophilic proteases dramatically reduce single-cell RNA-seq artifacts: A molecular atlas of kidney development. Development (Cambridge, England), 144(19), 3625–3632. 10.1242/dev.151142 - DOI - PMC - PubMed
    1. Aldridge S, & Teichmann SA (2020). Single cell transcriptomics comes of age. Nature Communications, 11(1), 4307. 10.1038/s41467-020-18158-5 - DOI - PMC - PubMed
    1. Allaway KC, Gabitto MI, Wapinski O, Saldi G, Wang C-Y, Bandler RC, … Fishell G (2021). Genetic and epigenetic coordination of cortical interneuron development. Nature, 597(7878), 693–697. 10.1038/s41586-021-03933-1 - DOI - PMC - PubMed
    1. Alles J, Karaiskos N, Praktiknjo SD, Grosswendt S, Wahle P, Ruffault P-L, … Rajewsky N (2017a). Cell fixation and preservation for droplet-based single-cell transcriptomics. BMC Biology, 15(1), 44. 10.1186/s12915-017-0383-5 - DOI - PMC - PubMed
    1. Alles J, Karaiskos N, Praktiknjo SD, Grosswendt S, Wahle P, Ruffault P-L, … Rajewsky N (2017b). Cell fixation and preservation for droplet-based single-cell transcriptomics. BMC Biology, 15(1), 44. 10.1186/s12915-017-0383-5 - DOI - PMC - PubMed