Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan 1;39(1):btac768.
doi: 10.1093/bioinformatics/btac768.

SCExecute: custom cell barcode-stratified analyses of scRNA-seq data

Affiliations

SCExecute: custom cell barcode-stratified analyses of scRNA-seq data

Nathan Edwards et al. Bioinformatics. .

Abstract

Motivation: In single-cell RNA-sequencing (scRNA-seq) data, stratification of sequencing reads by cellular barcode is necessary to study cell-specific features. However, apart from gene expression, the analyses of cell-specific features are not sufficiently supported by available tools designed for high-throughput sequencing data.

Results: We introduce SCExecute, which executes a user-provided command on barcode-stratified, extracted on-the-fly, single-cell binary alignment map (scBAM) files. SCExecute extracts the alignments with each cell barcode from aligned, pooled single-cell sequencing data. Simple commands, monolithic programs, multi-command shell scripts or complex shell-based pipelines are then executed on each scBAM file. scBAM files can be restricted to specific barcodes and/or genomic regions of interest. We demonstrate SCExecute with two popular variant callers-GATK and Strelka2-executed in shell-scripts together with commands for BAM file manipulation and variant filtering, to detect single-cell-specific expressed single nucleotide variants from droplet scRNA-seq data (10X Genomics Chromium System).In conclusion, SCExecute facilitates custom cell-level analyses on barcoded scRNA-seq data using currently available tools and provides an effective solution for studying low (cellular) frequency transcriptome features.

Availability and implementation: SCExecute is implemented in Python3 using the Pysam package and distributed for Linux, MacOS and Python environments from https://horvathlab.github.io/NGS/SCExecute.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
(a) SCExecute data processing. (b) UMAP projections showing neuroblastoma dataset cells classified by type (left), and cell distribution and cellular expressed variant allele frequency (VAFRNA) of the missense substitution rs4603 (1:151401549_T > C) in the gene PSMB4 (right). Cells in which the SNV locus is covered by less than 5 reads and the VAFRNA is not accessible are indicated as “NA”. VAFRNA=0 indicates cell where all the reads (5 and more) covering the SNV locus carried the reference nucleotide (See Supplementary Methods). In cells where at least 5 reads cover the SNV locus and 1>VAFRNA>0 the color intensity shows the relative expression of the sceSNV. The rs4603 VAFRNA cell distribution is consistent with germline homozygous variant in sample SAMN12799266, heterozygous variant in samples SAMN12799264 and SAMN12799263, and absence (homozygous reference) in sample SAMN12799269. (c) SCExecute runtimes in comparison with related samtools function for Chr 2 and Chr 22 of samples SAMN16086828 and SAMN16086829. The samtools-based approach, which extracts each barcode’s alignments one at a time independently and in parallel, requires more runtime than the SCExecute approaches. (d) SCExecute execution times by batch size. The time to construct the first batch size cell-specific scBAMs (First pass) is approximately constant for all Batch Size values. (e) SCExecute memory use by batch size. The memory footprint increases with batch size to accommodate the reads of the scBAM files for the current batch

References

    1. Ben-David U. et al. (2018) Genetic and transcriptional evolution alters cancer cell line drug response. Nature, 560, 325–330. - PMC - PubMed
    1. Dong R. et al. (2020) Single-cell characterization of malignant phenotypes and developmental trajectories of adrenal neuroblastoma. Cancer Cell., 38, 716–733. - PubMed
    1. Kaminow B. et al. (2021) STARsolo: accurate, fast and versatile mapping/quantification of single-cell and single-nucleus RNA-seq data. bioRxiv.
    1. Kim S. et al. (2018) Strelka2: fast and accurate calling of germline and somatic variants. Nat. Methods, 15, 591–594. - PubMed
    1. La Manno G. et al. (2018) RNA velocity of single cells. Nature, 560, 494–498. - PMC - PubMed

Publication types