Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2023 Jan 31:2023.01.29.526115.
doi: 10.1101/2023.01.29.526115.

Optimizing the design of spatial genomic studies

Affiliations

Optimizing the design of spatial genomic studies

Andrew Jones et al. bioRxiv. .

Update in

Abstract

Spatially-resolved genomic technologies have shown promise for studying the relationship between the structural arrangement of cells and their functional behavior. While numerous sequencing and imaging platforms exist for performing spatial transcriptomics and spatial proteomics profiling, these experiments remain expensive and labor-intensive. Thus, when performing spatial genomics experiments using multiple tissue slices, there is a need to select the tissue cross sections that will be maximally informative for the purposes of the experiment. In this work, we formalize the problem of experimental design for spatial genomics experiments, which we generalize into a problem class that we call structured batch experimental design. We propose approaches for optimizing these designs in two types of spatial genomics studies: one in which the goal is to construct a spatially-resolved genomic atlas of a tissue and another in which the goal is to localize a region of interest in a tissue, such as a tumor. We demonstrate the utility of these optimal designs, where each slice is a two-dimensional plane, on several spatial genomics datasets.

PubMed Disclaimer

Conflict of interest statement

Competing interests BEE is on the SAB of Creyon Bio, Arrepath, and Freenome. BEE is a consultant with Neumora and Cellarity.

Figures

Supplementary Figure 1:
Supplementary Figure 1:. Experimental design with one-dimensional design space.
(a) Designs after T = 100 iterations for the Random approach. (b) Designs after T = 100 iterations for the EIG approach. (c) Imputation performance after each new observation.
Figure 1:
Figure 1:. Demonstration of slicing in two-dimensional simulated tissue.
(a) Simulated spherical tissue with a grid of spots. (b) An example one-dimensional slice through the tissue. (c) The resulting observations at each spot after taking the slice in (b). The colors represent a univariate phenotype. (d) After slicing, the simulated tissue is split into two fragments. (e) Each line represents a candidate one-dimensional slice. Each slice is colored by its EIG (normalized to have a maximum of one). (f) The EIG-maximizing slice. (g) Tissue fragments after T = 10 iterations of slicing. Each color represents a distinct fragment.
Figure 2:
Figure 2:. Imputing unobserved gene expression from observed cross sections.
(a) Simulated tissue colored by synthetic gene expression. (b) An example slice through the synthetic tissue. (c) The resulting observations from the slice in (b). (d) R2 for imputations of gene expression after each slicing iteration for each method.
Figure 3:
Figure 3:. Synthetic slicing experiment for localizing a region of interest.
(a) Twodimensional simulated spatial gene expression data with a region of interest in orange. (b) Pointwise observations. Orange points are labeled as belonging to the ROI, blue points are outside the ROI, and gray points are unobserved. (c) Estimated EIG for each spatial location (where each design is a single point). (d) Estimated EIG for each horizontal slice design. (e) Synthetic ROI data. (f) Slices chosen after T = 5 iterations of running our model. (g) F1 score of predictions after each iteration.
Figure 4:
Figure 4:. Application to Visium data.
(a) Spatial locations of tissue. (b) Slices chosen by each approach after T = 5 iterations. The outline of the tissue is shown by the solid black line, and the slices chosen by each approach are shown by the dashed lines. The color legend is in panel (c). (c) Predictive R2 of the held-out gene expression for both approaches across iterations.
Figure 5:
Figure 5:. Reconstructing the Allen Brain Atlas.
(a) Allen Brain Atlas coordinates colored by the expression of PCP4. (b) An example slice through the coordinates. (c) The resulting observations after taking this slice. (d) The slices and observations chosen by the EIG approach. (e) Imputation performance across experimental iterations.
Figure 6:
Figure 6:. Localizing invasive carcinoma in prostate tissue.
(a) Histology image of the tissue section with pathologist annotations overlaid. Image from 10x Genomics website. (b) Slices chosen by the EIG method. Cancerous spots are shown in red. (c) F1 score of tumor/healthy label predictions after each iteration of experimental design. (d) Tumor/healthy predictions following five iterations of design. Stronger yellow color indicates spots with higher predicted probability of containing tumorous tissue.

References

    1. 10x Genomics (2020). Mouse Brain Serial Sections (Sagittal-Posterior), Spatial Gene Expression Dataset by Space Ranger 1.1.0, 10x Genomics, (2020, June 23).
    1. Baker E. A. G., Schapiro D., Dumitrascu B., Vickovic S., and Regev A. (2022). Power analysis for spatial omics. bioRxiv. - PMC - PubMed
    1. Box J. F. (1980). RA Fisher and the design of experiments, 1922–1926. The American Statistician, 34(1):1–7.
    1. Buzug T. M. (2011). Computed tomography. In Springer handbook of medical technology, pages 311–342. Springer.
    1. Camerlenghi F., Dumitrascu B., Ferrari F., Engelhardt B. E., and Favaro S. (2020). Nonparametric Bayesian multiarmed bandits for single-cell experiment design. The Annals of Applied Statistics, 14(4):2003–2019.

Publication types