Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Nov 8:2024.11.04.621867.
doi: 10.1101/2024.11.04.621867.

Addressing the mean-variance relationship in spatially resolved transcriptomics data with spoon

Affiliations

Addressing the mean-variance relationship in spatially resolved transcriptomics data with spoon

Kinnary Shah et al. bioRxiv. .

Update in

Abstract

An important task in the analysis of spatially resolved transcriptomics data is to identify spatially variable genes (SVGs), or genes that vary in a 2D space. Current approaches rank SVGs based on either p-values or an effect size, such as the proportion of spatial variance. However, previous work in the analysis of RNA-sequencing identified a technical bias, referred to as the "mean-variance relationship", where highly expressed genes are more likely to have a higher variance. Here, we demonstrate the mean-variance relationship in spatial transcriptomics data. Furthermore, we propose spoon, a statistical framework using Empirical Bayes techniques to remove this bias, leading to more accurate prioritization of SVGs. We demonstrate the performance of spoon in both simulated and real spatial transcriptomics data. A software implementation of our method is available at https://bioconductor.org/packages/spoon.

Keywords: Gaussian process regression; empirical Bayes; mean-variance bias; spatial transcriptomics; spatially variable gene.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest: None declared.

Figures

Figure 1:
Figure 1:. Calculating precision weights for individual observations.
These data are from Invasive Ductal Carcinoma breast tissue analyzed with 10x Genomics Visium [39], hereafter referred to as “Ductal Breast”. (A-C) The square root of the residual standard deviations estimated using nearest neighbor Gaussian processes (sg defined in Equation 3) are plotted against average logcount (r˜). (B) Same as A, except a spline curve (purple) is fitted to the data to estimate the gene-wise mean-variance relationship. (C) Using the fitted spline curve, each predicted count value (λˆgi) is mapped to its corresponding square root standard deviation value using spl(λˆgi)-4.
Figure 2:
Figure 2:. Mean-variance relationship exists in spatially resolved transcriptomics.
Using data from different human tissues, in order from top to bottom: DLPFC [15], Ductal Breast cancer [39], HPC [16], LC [42], and Ovarian cancer [43], we quantified the mean-variance relationship. Each point is a gene colored by the likelihood ratio statistic for a test that compares the fitted model against a classical linear model for the spatial component of variance using a NNGP [9]. The likelihood ratio statistics (LR Stat) are scaled by the maximum likelihood ratio statistic for each dataset in order to have more uniform visualization. The x-axis is mean logcounts and the y-axes represent different components of variance, in order from left to right: total variance σ2+τ2, spatial variance σ2, nonspatial variance τ2, and proportion of spatial variance σ2/σ2+τ2.
Figure 3:
Figure 3:. Mean-rank relationship exists in spatial transcriptomics data.
Using three datasets, in order from top to bottom (DLPFC [15], Ovarian cancer [39], and Lobular Breast cancer [40]), we quantified the mean-rank relationship. The genes were binned into deciles based on mean logcounts. Decile 1 contains the lowest mean expression values. The x-axis represents the rank. Within each decile, the density of the top 10% ranks is plotted as the signal in blue, while the density of the remaining ranks is plotted as the background in orange. Each subfigure shows the mean-rank relationship that persists after applying each method, from left to right: Moran’s I [47], nnSVG [9], SPARK-X [11], SpaGFT [45], and SpatialDE2 [46].
Figure 4:
Figure 4:. Spoon removes the mean-variance relationship when detecting spatially variable genes.
This dataset consists of 1,000 simulated genes across 968 spots using a lengthscale of 100. Separately for unweighted and weighted methods, the genes were binned into deciles based on mean logcounts. Decile 1 contains the lowest mean expression values. Ridge plots for the (A) unweighted ranks and (B) weighted ranks are shown. Within each decile (y-axis), the density of the top 10% of ranks is plotted as the signal, while the density of the remaining ranks is plotted as the background. (C) False discovery rate (FDR) as a function of Type I error (α). As a function of FDR, we show the (D) true negative rate (TNR) and (E) true positive rate (TPR). The red represents weighted nnSVG and the blue represents unweighted nnSVG. These plots represent the average performance across five iterations of the same simulation, each with unique random seeds.
Figure 5:
Figure 5:. Spoon helps to detect SVGs associated with cancer that are lowly expressed.
We used four datasets to evaluate the detection of cancer-related genes: Subtype Breast cancer [41], Ovarian cancer [43], Lobular Breast cancer [40], and Ductal Breast cancer [39]. Each bar contains the intersection of the set of genes of interest with genes within the set associated with cancer. For the first four rows, we defined low mean genes as those with means less than the 25th percentile in the dataset. Within the set of low mean genes, we found genes that were in the lowest 10% of ranks before weighting and then increased to the highest 10% of ranks after weighting. This is the set of genes of interest. The intersection in orange is the number of low mean and higher ranked genes that were found to be associated with the cancer of the dataset. For the last four rows, we defined small lengthscale genes as those with lengthscales between 40 to 90. Within the set of small lengthscale genes, we found genes that were ranked higher after weighting. This is the set of genes of interest. The intersection in orange shows the number of small lengthscale genes that were ranked higher and found to be associated with the cancer type of the dataset.

Similar articles

References

    1. Marx V.. Method of the Year: spatially resolved transcriptomics. Nature Methods, 18(1):9–14, Jan. 2021. ISSN 1548–7091, 1548–7105. doi: 10.1038/s41592-020-01033-y. URL https://www.nature.com/articles/s41592-020-01033-y. - DOI - PubMed
    1. Deshpande A., Loth M., Sidiropoulos D. N., Zhang S., Yuan L., Bell A. T., Zhu Q., Ho W. J., Santa-Maria C., Gilkes D. M., Williams S. R., Uytingco C. R., Chew J., Hartnett A., Bent Z. W., Favorov A. V., Popel A. S., Yarchoan M., Kiemen A., Wu P.-H., Fujikura K., Wirtz D., Wood L. D., Zheng L., Jaffee E. M., Anders R. A., Danilova L., Stein-O’Brien G., Kagohara L. T., and Fertig E. J.. Uncovering the spatial landscape of molecular interactions within the tumor microenvironment through latent spaces. Cell Systems, 14(4):285–301.e4, Apr. 2023. ISSN 24054712. doi: 10.1016/j.cels.2023.03.004. URL https://linkinghub.elsevier.com/retrieve/pii/S2405471223000807. - DOI - PMC - PubMed
    1. Rao A., Barkley D., França G. S., and Yanai I.. Exploring tissue architecture using spatial transcriptomics. Nature, 596(7871):211–220, Aug. 2021. ISSN 0028–0836, 1476–4687. doi: 10.1038/s41586-021-03634-9. URL https://www.nature.com/articles/s41586-021-03634-9. - DOI - PMC - PubMed
    1. Garcia-Alonso L., Lorenzi V., Mazzeo C. I., Alves-Lopes J. P., Roberts K., Sancho-Serra C., Engelbert J., Marečková M., Gruhn W. H., Botting R. A., Li T., Crespo B., Van Dongen S., Kiselev V. Y., Prigmore E., Herbert M., Moffett A., Chédotal A., Bayraktar O. A., Surani A., Haniffa M., and Vento-Tormo R.. Single-cell roadmap of human gonadal development. Nature, 607(7919):540–547, July 2022. ISSN 0028–0836, 1476–4687. doi: 10.1038/s41586-022-04918-4. URL https://www.nature.com/articles/s41586-022-04918-4. - DOI - PMC - PubMed
    1. Chen K. S., Noureldein M. H., Rigan D. M., Hayes J. M., Savelieff M. G., and Feldman E. L.. Regional interneuron transcriptional changes reveal pathologic markers of disease progression in a mouse model of Alzheimer’s disease, Nov. 2023. URL 10.1101/2023.11.01.565165v1. - DOI

Publication types

LinkOut - more resources