Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2023 Sep 15:2023.09.14.543267.
doi: 10.1101/2023.09.14.543267.

Universal preprocessing of single-cell genomics data

Affiliations

Universal preprocessing of single-cell genomics data

A Sina Booeshaghi et al. bioRxiv. .

Abstract

We describe a workflow for preprocessing a wide variety of single-cell genomics data types. The approach is based on parsing of machine-readable seqspec assay specifications to customize inputs for kb-python, which uses kallisto and bustools to catalog reads, error correct barcodes, and count reads. The universal preprocessing method is implemented in the Python package cellatlas that is available for download at: https://github.com/cellatlas/cellatlas/.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Single-cell genomics assays are variations on a theme of physical isolation, molecular capture, and library generation. The read structure shown here is provided as an example; read structure may vary depending on assay type.
Figure 2:
Figure 2:
Efficiency of UMI detection as measured by relating the number of reads uniquely aligned for each cell to the number of UMIs for the a) RNA, b) protein, and c) tag modalities.
Figure 3:
Figure 3:
Cross-technology comparison of registered ATAC and RNA quantifications from PBMCs assayed with (a) DOGMA-seq and (b) 10x Multiome.

References

    1. Battenberg Kai, Thomas Kelly S., Ras Radu Abu, Hetherington Nicola A., Hayashi Makoto, and Minoda Aki. 2022. “A Flexible Cross-Platform Single-Cell Data Processing Pipeline.” Nature Communications 13 (1): 6847. - PMC - PubMed
    1. Booeshaghi A. Sina, Chen Xi, and Pachter Lior. 2023. “A Machine-Readable Specification for Genomics Assays.” bioRxiv : The Preprint Server for Biology, March. 10.1101/2023.03.17.533215. - DOI - PMC - PubMed
    1. Booeshaghi A. Sina, Yao Zizhen, van Velthoven Cindy, Smith Kimberly, Tasic Bosiljka, Zeng Hongkui, and Pachter Lior. 2021. “Isoform Cell-Type Specificity in the Mouse Primary Motor Cortex.” Nature 598 (7879): 195–99. - PMC - PubMed
    1. Bray Nicolas L., Pimentel Harold, Melsted Páll, and Pachter Lior. 2016. “Near-Optimal Probabilistic RNA-Seq Quantification.” Nature Biotechnology 34 (5): 525–27. - PubMed
    1. Cao Junyue, Packer Jonathan S., Ramani Vijay, Cusanovich Darren A., Huynh Chau, Daza Riza, Qiu Xiaojie, et al. 2017. “Comprehensive Single-Cell Transcriptional Profiling of a Multicellular Organism.” Science 357 (6352): 661–67. - PMC - PubMed

Publication types