Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006;34(14):e101.
doi: 10.1093/nar/gkl520. Epub 2006 Aug 9.

Information-theoretic identification of predictive SNPs and supervised visualization of genome-wide association studies

Affiliations

Information-theoretic identification of predictive SNPs and supervised visualization of genome-wide association studies

Kavitha Bhasi et al. Nucleic Acids Res. 2006.

Abstract

The size, dimensionality and the limited range of the data values makes visualization of single nucleotide polymorphism (SNP) datasets challenging. The purpose of this study is to evaluate the usefulness of 3D VizStruct, a novel multi-dimensional data visualization technique for SNP datasets capable of identifying informative SNPs in genome-wide association studies. VizStruct is an interactive visualization technique that reduces multi-dimensional data to three dimensions using a combination of the discrete Fourier transform and the Kullback-Leibler divergence. The performance of 3D VizStruct was challenged with several diverse, biologically relevant published datasets including the human lipoprotein lipase (LPL) gene locus, the human Y-chromosome in several populations and a multi-locus genotype dataset of coral samples from four populations. In every case, the SNPs and or polymorphic markers identified by the 3D VizStruct mapping were predictive of the underlying biology.

PubMed Disclaimer

Figures

Figure 1
Figure 1
(A) (Upper left panel) shows the 3D VizStruct mapping of the LPL genotypes. The x- and y-axes are the real and imaginary components of the first harmonic of the DFT and the z-axis is the KLD; each point corresponds to a SNP and the SNPs with the highest KLD values are highlighted with the open triangles. (BD) show the distribution of the genotypes for three SNPs with the highest values of the KLD in the African-American patients from Jackson, MS (closed circles) and Caucasian-American patients from Rochester, MN (open circles). The x-axis in (B–D) is the sample number and the y-axis are the genotypes with the homozygous genotypes coded as 1 and 3 for the major and minor allele, respectively, and the heterozygous genotype is coded 2.
Figure 2
Figure 2
(A) (Upper left panel) shows the 3D VizStruct mapping of the Y-chromosome SNPs. The x- and y-axes are the real and imaginary components of the first harmonic of the DFT and the z-axis is the KLD; each point corresponds to a SNP and the SNPs with the highest KLD values are highlighted with the open triangles. (BD) show the distribution of the genotypes for three SNPs with the highest values of the KLD in the African-American (closed circles) and Caucasian-American (open circles) and Han Chinese (open triangles) subjects. The x-axis in (BD) is the sample number and the y-axes are the genotypes with the homozygous genotypes coded as 1 and 2 for the major and minor allele, respectively.
Figure 3
Figure 3
(A) (Upper panel) is a map of the southeastern United States (made using M. Weinelt, Online Map Creation: ) showing the locations of the Bahamas (BAH), Crocker and Conch (CC) and Flower Gardens Banks (FGB) coral reefs from which the samples were derived. The grid on the map indicates latitude north and longitude west. (B) Shows the 3D VizStruct mapping of the genotyping results from AFLP analysis of the coral samples. The x- and y-axes are the real and imaginary components of the first harmonic of the DFT and the z-axis is the KLD; each point corresponds to a marker and the markers with the highest KLD values are highlighted with the open triangles. (CE) Show the distribution of the genotypes for three amplification fragments with the highest values of the KLD in the samples from the Bahamas (open circles, n = 22), the Flower Garden Banks (filled circles, n = 28), Crocker (open triangles, n = 17), Conch (filled triangles, n = 14) and the recruits from the Flower Garden Banks (open diamonds, n = 11). The x-axis is the sample number and the y-axis are the genotypes; the genotype was coded as 1 if the fragment was absent and 2 if the fragment was present.
Figure 4
Figure 4
Relationship between the KLD versus the linkage disequilibrium D for a range of allele frequencies. The allele frequencies at one locus were kept constant at 0.9 for the major allele (A) and 0.1 for the minor allele (B). The major allele frequencies at the other locus were varied as indicated and were 0.99 (filled circles), 0.95 (open circles), 0.9 (filled triangles) or 0.6 (open triangles). The solid lines are a power-law fit to the results. Figure 1A uses linear axes and Figure 1B shows the same data on logarithmic axes.

References

    1. Mir K.U., Southern E.M. Sequence variation in genes and genomic DNA: methods for large-scale analysis. Annu. Rev. Genomics Hum. Genet. 2000;1:329–360. - PubMed
    1. Erichsen H.C., Chanock S.J. SNPs in cancer research and treatment. Br. J. Cancer. 2004;90:747–751. - PMC - PubMed
    1. Suh Y., Vijg J. SNP discovery in associating genetic variation with human disease phenotypes. Mutat. Res. 2005;573:41–53. - PubMed
    1. Xu H., Gregory S.G., Hauser E.R., Stenger J.E., Pericak-Vance M.A., Vance J.M., Zuchner S., Hauser M.A. SNPselector: a web tool for selecting SNPs for genetic association studies. Bioinformatics. 2005;21:4181–4186. - PMC - PubMed
    1. Bhadra D., Garg A. An Interactive Visual Framework for Detecting Clusters of a Multidimensional Dataset. 2001. Technical Report 2001–03, State University of New York, Buffalo.

Publication types

Substances