Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2023 Apr 20:2023.04.03.535366.
doi: 10.1101/2023.04.03.535366.

Reconstruction Set Test (RESET): a computationally efficient method for single sample gene set testing based on randomized reduced rank reconstruction error

Affiliations

Reconstruction Set Test (RESET): a computationally efficient method for single sample gene set testing based on randomized reduced rank reconstruction error

H Robert Frost. bioRxiv. .

Update in

Abstract

We have developed a new, and analytically novel, single sample gene set testing method called Reconstruction Set Test (RESET). RESET quantifies gene set importance at both the sample-level and for the entire dataset based on the ability of set genes to reconstruct values for all measured genes. RESET addresses four important limitations of current techniques: 1) existing single sample methods are designed to detect mean differences and struggle to identify differential correlation patterns, 2) computationally efficient techniques are self-contained methods and cannot directly detect competitive scenarios where set genes differ from non-set genes in the same sample, 3) the scores generated by current methods can only be accurately compared across samples for a single set and not between sets, and 4) the computational performance of even the fastest existing methods be significant on very large datasets. RESET is realized using a computationally efficient randomized reduced rank reconstruction algorithm (available via the RESET R package on CRAN) that can effectively detect patterns of differential abundance and differential correlation for self-contained and competitive scenarios. As demonstrated using real and simulated scRNA-seq data, RESET provides superior accuracy at a lower computational cost relative to other single sample approaches.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest: None declared.

Figures

Figure 1:
Figure 1:
Classification performance of RESET.det, RESET.ran, VAM, GSVA, and ssGSEA on scRNA-seq data simulated according to Section 2.7 for the block design. Each panel illustrates the relationship between the area under the receiver operating characteristic curve (AUC) and one of the simulation parameters. The vertical dotted lines mark the default parameter value used in the other panels. Error bars represent the standard error of the mean.
Figure 2:
Figure 2:
Classification performance of RESET.det, RESET.ran, VAM, GSVA, and ssGSEA on scRNA-seq data simulated according to Section 2.7 for the pure self-contained design. Each panel illustrates the relationship between the area under the receiver operating characteristic curve (AUC) and one of the simulation parameters. The vertical dotted lines mark the default parameter value used in the other panels. Error bars represent the standard error of the mean.
Figure 3:
Figure 3:
Classification performance of RESET.det, RESET.ran, VAM, GSVA, and ssGSEA on scRNA-seq data simulated according to Section 2.7 for the pure competitive design. Each panel illustrates the relationship between the area under the receiver operating characteristic curve (AUC) and one of the simulation parameters. The vertical dotted lines mark the default parameter value used in the other panels. Error bars represent the standard error of the mean.
Figure 4:
Figure 4:
Overall classification performance of RESET.det and RESET.ran on scRNA-seq data simulated according to Section 2.7. Each panel illustrates the relationship between the area under the receiver operating characteristic curve (AUC) and one of the simulation parameters. The vertical dotted lines mark the default parameter value used in the other panels. Error bars represent the standard error of the mean.
Figure 5:
Figure 5:
Average execution time of RESET.det, VAM, GSVA, and ssGSEA relative to RESET.ran. Relative values are plotted on the log10 scale. Execution times were computed on data simulated according to the procedure outlined in Section 2.7 for the block design. Error bars represent the standard error of the mean.
Figure 6:
Figure 6:
Heatmap visualization of the RESET cell-specific scores for the top five BioCarta pathways most enriched in each cluster of the PBMC scRNA-seq data according to the log2 fold-change in the mean RESET score of cells in the cluster relative to cells not in the cluster. Note that gene sets only appear once in the heatmap even if they are among the top five sets for multiple clusters.
Figure 7:
Figure 7:
Visualization of cell type pathway enrichment as computed using either VAM or RESET scores.
Figure 8:
Figure 8:
Projection of mouse brain scRNA-seq data onto the first two UMAP dimensions. Each point in the plot represents one cell, which are colored and labeled accounting to the output from unsupervised clustering.
Figure 9:
Figure 9:
Visualization of cluster-level Gene Ontology biological process term enrichment as computed using either VAM or RESET scores.

Similar articles

References

    1. Allison D.B., Cui X., Page G.P., Sabripour M.: Microarray data analysis: from disarray to consolidation and consensus. Nature Reviews Genetics 7(1), 55–65 (2006). doi:10.1038/nrg1749 - DOI - PubMed
    1. Subramanian A., Tamayo P., Mootha V.K., Mukherjee S., Ebert B.L., Gillette M.A., Paulovich A., Pomeroy S.L., Golub T.R., Lander E.S., Mesirov J.P.: Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102(43), 15545–15550 (2005). doi:10.1073/pnas.0506580102 - DOI - PMC - PubMed
    1. Khatri P., Sirota M., Butte A.J.: Ten years of pathway analysis: current approaches and outstanding challenges. PLoS Computational Biology 8(2), 1002375 (2012). doi:10.1371/journal.pcbi.1002375 - DOI - PMC - PubMed
    1. Hung J.-H., Yang T.-H., Hu Z., Weng Z., Delisi C.: Gene set enrichment analysis: performance evaluation and usage guidelines. Brief Bioinform 13(3), 281–91 (2012). doi:10.1093/bib/bbr049 - DOI - PMC - PubMed
    1. Maciejewski H.: Gene set analysis methods: statistical models and methodological differences. Brief Bioinform 15(4), 504–18 (2014). doi:10.1093/bib/bbt002 - DOI - PMC - PubMed

Publication types