Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jul 1;35(13):2335-2337.
doi: 10.1093/bioinformatics/bty950.

pyNVR: investigating factors affecting feature selection from scRNA-seq data for lineage reconstruction

Affiliations

pyNVR: investigating factors affecting feature selection from scRNA-seq data for lineage reconstruction

Bob Chen et al. Bioinformatics. .

Abstract

Motivation: The emergence of single-cell RNA-sequencing has enabled analyses that leverage transitioning cell states to reconstruct pseudotemporal trajectories. Multidimensional data sparsity, zero inflation and technical variation necessitate the selection of high-quality features that feed downstream analyses. Despite the development of numerous algorithms for the unsupervised selection of biologically relevant features, their differential performance remains largely unaddressed.

Results: We implemented the neighborhood variance ratio (NVR) feature selection approach as a Python package with substantial improvements in performance. In comparing NVR with multiple unsupervised algorithms such as dpFeature, we observed striking differences in features selected. We present evidence that quantifiable dataset properties have observable and predictable effects on the performance of these algorithms.

Availability and implementation: pyNVR is freely available at https://github.com/KenLauLab/NVR.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Evaluation of pyNVR performance. (A) Fold difference in runtime between the Python and R implementations of NVR. (B) Gene set similarity given different datasets. (C) Gene set similarity and its relationship with cell number. (D) Gene set similarity and its relationship with closeness threshold-imposed sampling. (E) Representative p-Creode graphs generated using genes selected from closeness-thresholded samples. Heatmap overlay and gating depicts Myc and putative stem-like cell states, respectively

References

    1. Butler A., et al. (2018) Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol., 36, 411–420. - PMC - PubMed
    1. Elowitz M.B. (2002) Stochastic gene expression in a single cell. Science, 297, 1183–1186. - PubMed
    1. Herring C.A., et al. (2018a) Single-cell computational strategies for lineage reconstruction in tissue systems. Cell. Mol. Gastroenterol. Hepatol., 5, 539–548. - PMC - PubMed
    1. Herring C.A., et al. (2018b) Unsupervised trajectory analysis of single-cell RNA-seq and imaging data reveals alternative tuft cell origins in the gut. Cell Syst., 6, 37–51.e9. - PMC - PubMed
    1. Kim J., Marioni J.C. (2013) Inferring the kinetics of stochastic gene expression from single-cell RNA-sequencing data. Genome Biol., 14, R7. - PMC - PubMed

Publication types

Substances