Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Mar;19(3):316-322.
doi: 10.1038/s41592-022-01408-3. Epub 2022 Mar 11.

Alevin-fry unlocks rapid, accurate and memory-frugal quantification of single-cell RNA-seq data

Affiliations

Alevin-fry unlocks rapid, accurate and memory-frugal quantification of single-cell RNA-seq data

Dongze He et al. Nat Methods. 2022 Mar.

Abstract

The rapid growth of high-throughput single-cell and single-nucleus RNA-sequencing (scRNA-seq and snRNA-seq) technologies has produced a wealth of data over the past few years. The size, volume and distinctive characteristics of these data necessitate the development of new computational methods to accurately and efficiently quantify sc/snRNA-seq data into count matrices that constitute the input to downstream analyses. We introduce the alevin-fry framework for quantifying sc/snRNA-seq data. In addition to being faster and more memory frugal than other accurate quantification approaches, alevin-fry ameliorates the memory scalability and false-positive expression issues that are exhibited by other lightweight tools. We demonstrate how alevin-fry can be effectively used to quantify sc/snRNA-seq data, and also how the spliced and unspliced molecule quantification required as input for RNA velocity analyses can be seamlessly extracted from the same preprocessed data used to generate normal gene expression count matrices.

PubMed Disclaimer

Conflict of interest statement

Competing Interests Statement

RP is a co-founder of Ocean Genomics, inc. The remaining authors declare no competing interests.

Figures

Fig. 1 – 2-column figure.
Fig. 1 – 2-column figure.. Overview of the alevin-fry pipeline
(operating in unspliced, spliced, ambiguous (USA) quantification mode). The arrows highlight the flow of data through the pipeline, whose output is a matrix specifying the expected counts of each of the considered splicing states of each gene within each quantified cell.
Fig. 2 – 2-column, 5-panel figure.
Fig. 2 – 2-column, 5-panel figure.. Comprehensive analysis of the performance pf alevin-fry on real and simulated datasets.
(a) The frequency distribution of the presence of genes across all shared cells for STARsolo, kallisto|bustools, and alevin-fry (including multiple index types for alevin-fry) on the simulated data. Different color lines represent the quantification methods. Within the variants of alevin-fry, txome stands for transcriptome reference (i.e., just indexing the annotated, spliced, transcriptome), and sketch (pseudoalignment with structural constraints) and sla (selective-alignment) label the mapping method used to obtain the result. Due to the similarity of the distributions, the line of STARsolo is occluded by the line of alevin-fry(splici, sla). (b) A visualization of the velocity estimation derived from alevin-fry counts in a UMAP-based embedding after assigning all ambiguous counts as spliced; the streamlines represent the direction of RNA velocity estimated by scVelo. Points (cells) are colored according to the cell type annotation. (c) The t-SNE embedding plot of an alevin-fry processed mouse placenta snRNA-seq dataset. The color of each nucleus represents the inferred cell-type annotation, which was learned from a reference dataset. (d) and (e) are timing and peak memory usage for all tools (run with 16 threads) on the different datasets evaluated in this manuscript. The x-axis of (d) and (e) represents the evaluated datasets. The y-axis of (d) represents the run time of each tool, measured in seconds. The y-axis of (e) denotes the peak memory usage — measured as the maximum resident set size (max rss) — during the execution of each tool. Dashed horizontal lines in (d) denote 15 minutes, 30 minutes, 60 minutes and 90 minutes, respectively. Dashed horizontal lines in (e) denote 4GB, 8GB, 16GB and 32GB, respectively.

References

    1. Svensson Valentine, Veiga Beltrame Eduardo da, and Pachter Lior. A curated database reveals trends in single-cell transcriptomics. Database, 2020, 2020. - PMC - PubMed
    1. Li Bo, Ruotti Victor, Stewart Ron M, Thomson James A, and Dewey Colin N. RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics, 26(4):493–500, 2010. - PMC - PubMed
    1. Bray Nicolas L, Pimentel Harold, Melsted Páll, and Pachter Lior. Near-optimal probabilistic RNA-seq quantification. Nature Biotechnology, 34(5):525, 2016. - PubMed
    1. Patro Rob, Duggal Geet, Love Michael I, Irizarry Rafael A, and Kingsford Carl. Salmon provides fast and bias-aware quantification of transcript expression. Nature Methods, 14(4):417–419, 2017. - PMC - PubMed
    1. Srivastava Avi, Malik Laraib, Smith Tom, Sudbery Ian, and Patro Rob. Alevin efficiently estimates accurate gene abundances from dscRNA-seq data. Genome Biology, 20(1):1–16, 2019. - PMC - PubMed

References for the Methods Section

    1. Patro Rob, Duggal Geet, Love Michael I, Irizarry Rafael A, and Kingsford Carl. Salmon provides fast and bias-aware quantification of transcript expression. Nature Methods, 14(4):417–419, 2017. - PMC - PubMed
    1. Almodaresi Fatemeh, Sarkar Hirak, Srivastava Avi, and Patro Rob. A space and time-efficient index for the compacted colored de Bruijn graph. Bioinformatics, 34(13):i169–i177, 2018. - PMC - PubMed
    1. Kaminow Benjamin, Yunusov Dinar, and Dobin Alexander. STARsolo: accurate, fast and versatile mapping/quantification of single-cell and single-nucleus RNA-seq data. BioRxiv, 2021. doi: 10.1101/2021.05.05.442755. - DOI
    1. Bray Nicolas L, Pimentel Harold, Melsted Páll, and Pachter Lior. Near-optimal probabilistic RNA-seq quantification. Nature Biotechnology, 34(5):525, 2016. - PubMed
    1. Dobin Alexander, Davis Carrie A, Schlesinger Felix, Drenkow Jorg, Zaleski Chris, Jha Sonali, Batut Philippe, Chaisson Mark, and Gingeras Thomas R. STAR: ultrafast universal RNA-seq aligner. Bioinformatics, 29(1):15–21, 2013. - PMC - PubMed

Publication types

Substances