Visualization of SNPs with t-SNE

Alexander Platzer¹

Affiliations

PMID: 23457633
PMCID: PMC3574019
DOI: 10.1371/journal.pone.0056883

Visualization of SNPs with t-SNE

Alexander Platzer. PLoS One. 2013.

. 2013;8(2):e56883.

doi: 10.1371/journal.pone.0056883. Epub 2013 Feb 15.

Author

Alexander Platzer¹

Affiliation

¹ Gregor Mendel Institute, Vienna, Austria. alexander.platzer@gmi.oeaw.ac.at

PMID: 23457633
PMCID: PMC3574019
DOI: 10.1371/journal.pone.0056883

Abstract

Background: Single Nucleotide Polymorphisms (SNPs) are one of the largest sources of new data in biology. In most papers, SNPs between individuals are visualized with Principal Component Analysis (PCA), an older method for this purpose.

Principal findings: We compare PCA, an aging method for this purpose, with a newer method, t-Distributed Stochastic Neighbor Embedding (t-SNE) for the visualization of large SNP datasets. We also propose a set of key figures for evaluating these visualizations; in all of these t-SNE performs better.

Significance: To transform data PCA remains a reasonably good method, but for visualization it should be replaced by a method from the subfield of dimension reduction. To evaluate the performance of visualization, we propose key figures of cross-validation with machine learning methods, as well as indices of cluster validity.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The author has declared that no competing interests exist.

Figures

**Figure 1. SNP data transformed with PCA and t-SNE 1/2.**
On the left is a PCA-plot with the first two components, on the right a t-SNE-plot of the very same data from each data source. Data sources: Panel (a) is from the 1001 genomes project, (b) from the RegMap panel and (c) from hapmap3 r2.

**Figure 2. SNP data transformed with PCA and t-SNE 2/2.**
On the left is a PCA-plot with the first two components, on the right a t-SNE-plot of the very same data from each data source. Data sources: Panel (a) from hapmap3 r3 (compare with Fig. 1c) and (b) from the Rice Haplotype Map Project (only wild type where the label information was available).

See this image and copyright information in PMC

References

1. Pearson K (1901) On Lines and Planes of Closest Fit to Systems of Points in Space. Philosophical Magazine 2: 559–572.
1. Sun Z, Chai HS, Wu Y, White WM, Donkena KV, et al. (2011) Batch effect correction for genome-wide methylation data with Illumina Infinium platform. BMC Med Genomics 4: 84. - PMC - PubMed
1. Swingley WD, Meyer-Dombard DR, Shock EL, Alsop EB, Falenski HD, et al. (2012) Coordinating environmental genomics and geochemistry reveals metabolic transitions in a hot spring ecosystem. PLoS One 7: e38108. - PMC - PubMed
1. Zhou H, Muehlbauer G, Steffenson B (2012) Population structure and linkage disequilibrium in elite barley breeding germplasm from the United States. J Zhejiang Univ Sci B 13: 438–451. - PMC - PubMed
1. Hurtado MA, Racotta IS, Arcos F, Morales-Bojorquez E, Moal J, et al. (2012) Seasonal variations of biochemical, pigment, fatty acid, and sterol compositions in female Crassostrea corteziensis oysters in relation to the reproductive cycle. Comp Biochem Physiol B Biochem Mol Biol - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Visualization of SNPs with t-SNE

Affiliation

Visualization of SNPs with t-SNE

Author

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources