This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2024 Nov 8:2024.11.04.621867.

doi: 10.1101/2024.11.04.621867.

Addressing the mean-variance relationship in spatially resolved transcriptomics data with spoon

Kinnary Shah¹, Boyi Guo¹, Stephanie C Hicks^{1

2

3

4}

Affiliations

¹ Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.
² Department of Biomedical Engineering, Johns Hopkins School of Medicine, Baltimore, MD, USA.
³ Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
⁴ Malone Center for Engineering in Healthcare, Johns Hopkins University, MD, USA.

PMID: 39574747
PMCID: PMC11580860
DOI: 10.1101/2024.11.04.621867

Addressing the mean-variance relationship in spatially resolved transcriptomics data with spoon

Kinnary Shah et al. bioRxiv. 2024.

[Preprint]. 2024 Nov 8:2024.11.04.621867.

doi: 10.1101/2024.11.04.621867.

Authors

Kinnary Shah¹, Boyi Guo¹, Stephanie C Hicks^{1

2

3

4}

Affiliations

¹ Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.
² Department of Biomedical Engineering, Johns Hopkins School of Medicine, Baltimore, MD, USA.
³ Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
⁴ Malone Center for Engineering in Healthcare, Johns Hopkins University, MD, USA.

PMID: 39574747
PMCID: PMC11580860
DOI: 10.1101/2024.11.04.621867

Update in

Addressing the mean-variance relationship in spatially resolved transcriptomics data with spoon.
Shah K, Guo B, Hicks SC. Shah K, et al. Biostatistics. 2024 Dec 31;26(1):kxaf012. doi: 10.1093/biostatistics/kxaf012. Biostatistics. 2024. PMID: 40515599 Free PMC article.

Abstract

An important task in the analysis of spatially resolved transcriptomics data is to identify spatially variable genes (SVGs), or genes that vary in a 2D space. Current approaches rank SVGs based on either p-values or an effect size, such as the proportion of spatial variance. However, previous work in the analysis of RNA-sequencing identified a technical bias, referred to as the "mean-variance relationship", where highly expressed genes are more likely to have a higher variance. Here, we demonstrate the mean-variance relationship in spatial transcriptomics data. Furthermore, we propose spoon, a statistical framework using Empirical Bayes techniques to remove this bias, leading to more accurate prioritization of SVGs. We demonstrate the performance of spoon in both simulated and real spatial transcriptomics data. A software implementation of our method is available at https://bioconductor.org/packages/spoon.

Keywords: Gaussian process regression; empirical Bayes; mean-variance bias; spatial transcriptomics; spatially variable gene.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest: None declared.

Figures

**Figure 1:. Calculating precision weights for individual observations.**
These data are from Invasive Ductal Carcinoma breast tissue analyzed with 10x Genomics Visium [39], hereafter referred to as “Ductal Breast”. (**A-C**) The square root of the residual standard deviations estimated using nearest neighbor Gaussian processes ( $\sqrt{s_{g}}$ defined in Equation 3) are plotted against average logcount ( $\tilde{r}$ ). (B) Same as A, except a spline curve (purple) is fitted to the data to estimate the gene-wise mean-variance relationship. (C) Using the fitted spline curve, each predicted count value $({\hat{λ}}_{g i})$ is mapped to its corresponding square root standard deviation value using $s p l {({\hat{λ}}_{g i})}^{- 4}$ .

**Figure 2:. Mean-variance relationship exists in spatially resolved transcriptomics.**
Using data from different human tissues, in order from top to bottom: DLPFC [15], Ductal Breast cancer [39], HPC [16], LC [42], and Ovarian cancer [43], we quantified the mean-variance relationship. Each point is a gene colored by the likelihood ratio statistic for a test that compares the fitted model against a classical linear model for the spatial component of variance using a NNGP [9]. The likelihood ratio statistics (LR Stat) are scaled by the maximum likelihood ratio statistic for each dataset in order to have more uniform visualization. The x-axis is mean logcounts and the y-axes represent different components of variance, in order from left to right: total variance $σ^{2} + τ^{2}$ , spatial variance $σ^{2}$ , nonspatial variance $τ^{2}$ , and proportion of spatial variance $σ^{2} / (σ^{2} + τ^{2})$ .

**Figure 3:. Mean-rank relationship exists in spatial transcriptomics data.**
Using three datasets, in order from top to bottom (DLPFC [15], Ovarian cancer [39], and Lobular Breast cancer [40]), we quantified the mean-rank relationship. The genes were binned into deciles based on mean logcounts. Decile 1 contains the lowest mean expression values. The x-axis represents the rank. Within each decile, the density of the top 10% ranks is plotted as the signal in blue, while the density of the remaining ranks is plotted as the background in orange. Each subfigure shows the mean-rank relationship that persists after applying each method, from left to right: Moran’s I [47], nnSVG [9], SPARK-X [11], SpaGFT [45], and SpatialDE2 [46].

**Figure 4:. *Spoon* removes the mean-variance relationship when detecting spatially variable genes.**
This dataset consists of 1,000 simulated genes across 968 spots using a lengthscale of 100. Separately for unweighted and weighted methods, the genes were binned into deciles based on mean logcounts. Decile 1 contains the lowest mean expression values. Ridge plots for the (A) unweighted ranks and (B) weighted ranks are shown. Within each decile (y-axis), the density of the top 10% of ranks is plotted as the signal, while the density of the remaining ranks is plotted as the background. (C) False discovery rate (FDR) as a function of Type I error ( $α$ ). As a function of FDR, we show the (D) true negative rate (TNR) and (E) true positive rate (TPR). The red represents weighted nnSVG and the blue represents unweighted nnSVG. These plots represent the average performance across five iterations of the same simulation, each with unique random seeds.

**Figure 5:. *Spoon* helps to detect SVGs associated with cancer that are lowly expressed.**
We used four datasets to evaluate the detection of cancer-related genes: Subtype Breast cancer [41], Ovarian cancer [43], Lobular Breast cancer [40], and Ductal Breast cancer [39]. Each bar contains the intersection of the set of genes of interest with genes within the set associated with cancer. For the first four rows, we defined low mean genes as those with means less than the 25th percentile in the dataset. Within the set of low mean genes, we found genes that were in the lowest 10% of ranks before weighting and then increased to the highest 10% of ranks after weighting. This is the set of genes of interest. The intersection in orange is the number of low mean and higher ranked genes that were found to be associated with the cancer of the dataset. For the last four rows, we defined small lengthscale genes as those with lengthscales between 40 to 90. Within the set of small lengthscale genes, we found genes that were ranked higher after weighting. This is the set of genes of interest. The intersection in orange shows the number of small lengthscale genes that were ranked higher and found to be associated with the cancer type of the dataset.

See this image and copyright information in PMC

References

1. Marx V.. Method of the Year: spatially resolved transcriptomics. Nature Methods, 18(1):9–14, Jan. 2021. ISSN 1548–7091, 1548–7105. doi: 10.1038/s41592-020-01033-y. URL https://www.nature.com/articles/s41592-020-01033-y. - DOI - PubMed
1. Deshpande A., Loth M., Sidiropoulos D. N., Zhang S., Yuan L., Bell A. T., Zhu Q., Ho W. J., Santa-Maria C., Gilkes D. M., Williams S. R., Uytingco C. R., Chew J., Hartnett A., Bent Z. W., Favorov A. V., Popel A. S., Yarchoan M., Kiemen A., Wu P.-H., Fujikura K., Wirtz D., Wood L. D., Zheng L., Jaffee E. M., Anders R. A., Danilova L., Stein-O’Brien G., Kagohara L. T., and Fertig E. J.. Uncovering the spatial landscape of molecular interactions within the tumor microenvironment through latent spaces. Cell Systems, 14(4):285–301.e4, Apr. 2023. ISSN 24054712. doi: 10.1016/j.cels.2023.03.004. URL https://linkinghub.elsevier.com/retrieve/pii/S2405471223000807. - DOI - PMC - PubMed
1. Rao A., Barkley D., França G. S., and Yanai I.. Exploring tissue architecture using spatial transcriptomics. Nature, 596(7871):211–220, Aug. 2021. ISSN 0028–0836, 1476–4687. doi: 10.1038/s41586-021-03634-9. URL https://www.nature.com/articles/s41586-021-03634-9. - DOI - PMC - PubMed
1. Garcia-Alonso L., Lorenzi V., Mazzeo C. I., Alves-Lopes J. P., Roberts K., Sancho-Serra C., Engelbert J., Marečková M., Gruhn W. H., Botting R. A., Li T., Crespo B., Van Dongen S., Kiselev V. Y., Prigmore E., Herbert M., Moffett A., Chédotal A., Bayraktar O. A., Surani A., Haniffa M., and Vento-Tormo R.. Single-cell roadmap of human gonadal development. Nature, 607(7919):540–547, July 2022. ISSN 0028–0836, 1476–4687. doi: 10.1038/s41586-022-04918-4. URL https://www.nature.com/articles/s41586-022-04918-4. - DOI - PMC - PubMed
1. Chen K. S., Noureldein M. H., Rigan D. M., Hayes J. M., Savelieff M. G., and Feldman E. L.. Regional interneuron transcriptional changes reveal pathologic markers of disease progression in a mouse model of Alzheimer’s disease, Nov. 2023. URL 10.1101/2023.11.01.565165v1. - DOI

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

This is a preprint.

Addressing the mean-variance relationship in spatially resolved transcriptomics data with spoon

Affiliations

Addressing the mean-variance relationship in spatially resolved transcriptomics data with spoon

Authors

Affiliations

Update in

Abstract

Conflict of interest statement

Figures

References

Publication types

Grants and funding

LinkOut - more resources

Full Text Sources