Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Dec 3;2(1):7.
doi: 10.1186/1756-0381-2-7.

LD-spline: mapping SNPs on genotyping platforms to genomic regions using patterns of linkage disequilibrium

Affiliations

LD-spline: mapping SNPs on genotyping platforms to genomic regions using patterns of linkage disequilibrium

William S Bush et al. BioData Min. .

Abstract

Background: Gene-centric analysis tools for genome-wide association study data are being developed both to annotate single locus statistics and to prioritize or group single nucleotide polymorphisms (SNPs) prior to analysis. These approaches require knowledge about the relationships between SNPs on a genotyping platform and genes in the human genome. SNPs in the genome can represent broader genomic regions via linkage disequilibrium (LD), and population-specific patterns of LD can be exploited to generate a data-driven map of SNPs to genes.

Methods: In this study, we implemented LD-Spline, a database routine that defines the genomic boundaries a particular SNP represents using linkage disequilibrium statistics from the International HapMap Project. We compared the LD-Spline haplotype block partitioning approach to that of the four gamete rule and the Gabriel et al. approach using simulated data; in addition, we processed two commonly used genome-wide association study platforms.

Results: We illustrate that LD-Spline performs comparably to the four-gamete rule and the Gabriel et al. approach; however as a SNP-centric approach LD-Spline has the added benefit of systematically identifying a genomic boundary for each SNP, where the global block partitioning approaches may falter due to sampling variation in LD statistics.

Conclusion: LD-Spline is an integrated database routine that quickly and effectively defines the genomic region marked by a SNP using linkage disequilibrium, with a SNP-centric block definition algorithm.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Overview of the LD-Spline Algorithm. A matrix of all HapMap-based pair-wise LD values (D' or r2) is retrieved from the database. Using this matrix, the lower bound is incrementally extended to downstream SNPs while the pair-wise LD value between the downstream SNP and the input SNP is greater than the user-defined threshold (in this case r2 > 0.8). The process is repeated for the upper bound to define the marked genomic bounds for the input SNP.
Figure 2
Figure 2
Linkage disequilibrium (D') of chromosome 1 (top) and chromosome 18 (bottom) simulated using genomeSIMLA. Haploview-style correlation plots illustrate the LD structure (in D'). Each black line above the correlation plot indicates a haplotype block generated by the simulation, and the height of the bar above the horizontal line indicates SNP density.
Figure 3
Figure 3
Regional haplotype structure for simulated block 7 (top) and 5 (bottom) on chromosome 1. The physical location and minor allele frequency of each simulated SNP is shown on the tracks along the top of the figure, and LD structure in D' is shown in a Haploview-style correlation plot at the bottom. True haplotype blocks in the population are marked with dark lines in the correlation plot.
Figure 4
Figure 4
a-e: Haplotype block partitioning for simulated chromosome 1. Ten haplotype blocks were selected from the simulation of chromosome 1 for algorithm assessment. Blocks are identified by an integer ID shown across the top of the figure, indicating relative position within the 1000 SNPs simulated. The true bounds for each block are shown as gray vertical lines, with the thickness of the line indicating the block size. Each horizontal line represents a haplotype block called by the four gamete rule(a), Gabriel et al. method(b), or LD-Spline using a D' threshold of 1(c), 0.8(d), or 0.6(e) with the length of the line representing the number of SNPs included in the haplotype block call. The x-axis illustrates the upper and lower SNP index in the dataset for each block, and the y-axis indicates the dataset for which each block is called.
Figure 5
Figure 5
a- e: Haplotype block partitioning for simulated chromosome 18. Ten haplotype blocks were selected from the simulation of chromosome 18 for algorithm assessment. Blocks are identified by an integer ID shown across the top of the figure, indicating relative position within the 1000 SNPs simulated. The true bounds for each block are shown as gray vertical lines, with the thickness of the line indicating the block size. Each horizontal line represents a haplotype block called by the four gamete rule(a), Gabriel et al. method(b), or LD-Spline using a D' threshold of 1(c), 0.8(d), or 0.6(e), with the length of the line representing the number of SNPs included in the haplotype block call. The x-axis illustrates the upper and lower SNP index in the dataset for each block, and the y-axis indicates the dataset for which each block is called.
Figure 6
Figure 6
Frequency histogram of LD-Spline called haplotype block sizes. The Affymetrix Genome-wide SNP Array 6.0 (top) and the Illumina Human 1M -Duo (bottom) genotyping platforms were processed using the LD-Spline algorithm. The density distribution of haplotype block sizes is shown by frequency histograms.
Figure 7
Figure 7
Overview of the genomeSIMLA process. Chromosomes are randomly initialized in the first generation, and then randomly sampled with replacement and crossed to produce the next generation. This process continues until the population has the desired LD patterns. Individuals are then sampled from this population for datasets.

References

    1. Morton NE. Into the post-HapMap era. Adv Genet. 2008;60:727–742. full_text. - PubMed
    1. Manolio TA, Brooks LD, Collins FS. A HapMap harvest of insights into the genetics of common disease. J Clin Invest. 2008;118:1590–1605. doi: 10.1172/JCI34772. - DOI - PMC - PubMed
    1. Lewontin RC, Kojima Ki. The Evolutionary Dynamics of Complex Polymorphisms. Evolution. 2001;14:458–472. doi: 10.2307/2405995. - DOI
    1. Borecki IB, Province MA. Linkage and association: basic concepts. Adv Genet. 2008;60:51–74. full_text. - PubMed
    1. Slatkin M. Linkage disequilibrium--understanding the evolutionary past and mapping the medical future. Nat Rev Genet. 2008;9:477–485. doi: 10.1038/nrg2361. - DOI - PMC - PubMed