Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jun 1;36(11):3522-3527.
doi: 10.1093/bioinformatics/btaa189.

Projected t-SNE for batch correction

Affiliations

Projected t-SNE for batch correction

Emanuele Aliverti et al. Bioinformatics. .

Abstract

Motivation: Low-dimensional representations of high-dimensional data are routinely employed in biomedical research to visualize, interpret and communicate results from different pipelines. In this article, we propose a novel procedure to directly estimate t-SNE embeddings that are not driven by batch effects. Without correction, interesting structure in the data can be obscured by batch effects. The proposed algorithm can therefore significantly aid visualization of high-dimensional data.

Results: The proposed methods are based on linear algebra and constrained optimization, leading to efficient algorithms and fast computation in many high-dimensional settings. Results on artificial single-cell transcription profiling data show that the proposed procedure successfully removes multiple batch effects from t-SNE embeddings, while retaining fundamental information on cell types. When applied to single-cell gene expression data to investigate mouse medulloblastoma, the proposed method successfully removes batches related with mice identifiers and the date of the experiment, while preserving clusters of oligodendrocytes, astrocytes, and endothelial cells and microglia, which are expected to lie in the stroma within or adjacent to the tumours.

Availability and implementation: Source code implementing the proposed approach is available as an R package at https://github.com/emanuelealiverti/BC_tSNE, including a tutorial to reproduce the simulation studies.

Contact: aliverti@stat.unipd.it.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Simulation study. The colour of points varies according to cell types, whereas shapes vary with batch groups. Upper plot shows the unadjusted t-SNE coordinates, whereas results after adjustment are reported in the bottom panels
Fig. 2.
Fig. 2.
Unadjusted t-SNE coordinates. Points and shapes vary with batches
Fig. 3.
Fig. 3.
t-SNE coordinates after correction. Points and shapes vary with batches
Fig. 4.
Fig. 4.
t-SNE coordinates after adjustment. Points and shapes vary with cell types

References

    1. Aliverti E. et al. (2018) Removing the influence of a group variable in high-dimensional predictive modelling. arXiv Preprint arXiv : 1810.08255. - PMC - PubMed
    1. Butler A. et al. (2018) Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol., 36, 411–420. - PMC - PubMed
    1. Büttner M. et al. (2019) A test metric for assessing single-cell RNA-seq batch correction. Nat. Methods, 16, 43–49. - PubMed
    1. Cole M.B. et al. (2019) Performance assessment and selection of normalization procedures for single-cell RNA-seq. Cell Syst., 8, 315–328. - PMC - PubMed
    1. Ellison D.W. et al. (2011) Medulloblastoma: clinicopathological correlates of SHH, WNT, and non-SHH/WNT molecular subgroups. Acta Neuropathol., 121, 381–396. - PMC - PubMed

Publication types

LinkOut - more resources