This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2023 Apr 20:2023.04.03.535366.

doi: 10.1101/2023.04.03.535366.

Reconstruction Set Test (RESET): a computationally efficient method for single sample gene set testing based on randomized reduced rank reconstruction error

H Robert Frost¹

Affiliations

PMID: 37066315
PMCID: PMC10104009
DOI: 10.1101/2023.04.03.535366

Reconstruction Set Test (RESET): a computationally efficient method for single sample gene set testing based on randomized reduced rank reconstruction error

H Robert Frost. bioRxiv. 2023.

[Preprint]. 2023 Apr 20:2023.04.03.535366.

doi: 10.1101/2023.04.03.535366.

Author

H Robert Frost¹

Affiliation

¹ Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH 03755.

PMID: 37066315
PMCID: PMC10104009
DOI: 10.1101/2023.04.03.535366

Update in

Reconstruction Set Test (RESET): A computationally efficient method for single sample gene set testing based on randomized reduced rank reconstruction error.
Frost HR. Frost HR. PLoS Comput Biol. 2024 Apr 29;20(4):e1012084. doi: 10.1371/journal.pcbi.1012084. eCollection 2024 Apr. PLoS Comput Biol. 2024. PMID: 38683883 Free PMC article.

Abstract

We have developed a new, and analytically novel, single sample gene set testing method called Reconstruction Set Test (RESET). RESET quantifies gene set importance at both the sample-level and for the entire dataset based on the ability of set genes to reconstruct values for all measured genes. RESET addresses four important limitations of current techniques: 1) existing single sample methods are designed to detect mean differences and struggle to identify differential correlation patterns, 2) computationally efficient techniques are self-contained methods and cannot directly detect competitive scenarios where set genes differ from non-set genes in the same sample, 3) the scores generated by current methods can only be accurately compared across samples for a single set and not between sets, and 4) the computational performance of even the fastest existing methods be significant on very large datasets. RESET is realized using a computationally efficient randomized reduced rank reconstruction algorithm (available via the RESET R package on CRAN) that can effectively detect patterns of differential abundance and differential correlation for self-contained and competitive scenarios. As demonstrated using real and simulated scRNA-seq data, RESET provides superior accuracy at a lower computational cost relative to other single sample approaches.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest: None declared.

Figures

**Figure 1:**
Classification performance of RESET.det, RESET.ran, VAM, GSVA, and ssGSEA on scRNA-seq data simulated according to Section 2.7 for the block design. Each panel illustrates the relationship between the area under the receiver operating characteristic curve (AUC) and one of the simulation parameters. The vertical dotted lines mark the default parameter value used in the other panels. Error bars represent the standard error of the mean.

**Figure 2:**
Classification performance of RESET.det, RESET.ran, VAM, GSVA, and ssGSEA on scRNA-seq data simulated according to Section 2.7 for the pure self-contained design. Each panel illustrates the relationship between the area under the receiver operating characteristic curve (AUC) and one of the simulation parameters. The vertical dotted lines mark the default parameter value used in the other panels. Error bars represent the standard error of the mean.

**Figure 3:**
Classification performance of RESET.det, RESET.ran, VAM, GSVA, and ssGSEA on scRNA-seq data simulated according to Section 2.7 for the pure competitive design. Each panel illustrates the relationship between the area under the receiver operating characteristic curve (AUC) and one of the simulation parameters. The vertical dotted lines mark the default parameter value used in the other panels. Error bars represent the standard error of the mean.

**Figure 4:**
Overall classification performance of RESET.det and RESET.ran on scRNA-seq data simulated according to Section 2.7. Each panel illustrates the relationship between the area under the receiver operating characteristic curve (AUC) and one of the simulation parameters. The vertical dotted lines mark the default parameter value used in the other panels. Error bars represent the standard error of the mean.

**Figure 5:**
Average execution time of RESET.det, VAM, GSVA, and ssGSEA relative to RESET.ran. Relative values are plotted on the log₁₀ scale. Execution times were computed on data simulated according to the procedure outlined in Section 2.7 for the block design. Error bars represent the standard error of the mean.

**Figure 6:**
Heatmap visualization of the RESET cell-specific scores for the top five BioCarta pathways most enriched in each cluster of the PBMC scRNA-seq data according to the log2 fold-change in the mean RESET score of cells in the cluster relative to cells not in the cluster. Note that gene sets only appear once in the heatmap even if they are among the top five sets for multiple clusters.

**Figure 7:**
Visualization of cell type pathway enrichment as computed using either VAM or RESET scores.

**Figure 8:**
Projection of mouse brain scRNA-seq data onto the first two UMAP dimensions. Each point in the plot represents one cell, which are colored and labeled accounting to the output from unsupervised clustering.

**Figure 9:**
Visualization of cluster-level Gene Ontology biological process term enrichment as computed using either VAM or RESET scores.

See this image and copyright information in PMC

References

1. Allison D.B., Cui X., Page G.P., Sabripour M.: Microarray data analysis: from disarray to consolidation and consensus. Nature Reviews Genetics 7(1), 55–65 (2006). doi:10.1038/nrg1749 - DOI - PubMed
1. Subramanian A., Tamayo P., Mootha V.K., Mukherjee S., Ebert B.L., Gillette M.A., Paulovich A., Pomeroy S.L., Golub T.R., Lander E.S., Mesirov J.P.: Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102(43), 15545–15550 (2005). doi:10.1073/pnas.0506580102 - DOI - PMC - PubMed
1. Khatri P., Sirota M., Butte A.J.: Ten years of pathway analysis: current approaches and outstanding challenges. PLoS Computational Biology 8(2), 1002375 (2012). doi:10.1371/journal.pcbi.1002375 - DOI - PMC - PubMed
1. Hung J.-H., Yang T.-H., Hu Z., Weng Z., Delisi C.: Gene set enrichment analysis: performance evaluation and usage guidelines. Brief Bioinform 13(3), 281–91 (2012). doi:10.1093/bib/bbr049 - DOI - PMC - PubMed
1. Maciejewski H.: Gene set analysis methods: statistical models and methodological differences. Brief Bioinform 15(4), 504–18 (2014). doi:10.1093/bib/bbt002 - DOI - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

This is a preprint.

Reconstruction Set Test (RESET): a computationally efficient method for single sample gene set testing based on randomized reduced rank reconstruction error

Affiliation

Reconstruction Set Test (RESET): a computationally efficient method for single sample gene set testing based on randomized reduced rank reconstruction error

Author

Affiliation

Update in

Abstract

Conflict of interest statement

Figures

Similar articles

References

Publication types

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials

This is a preprint.

Update in

Abstract

Conflict of interest statement

Figures

Similar articles

References

Publication types

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials