Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Jun 6;92(6):882-94.
doi: 10.1016/j.ajhg.2013.04.023. Epub 2013 May 30.

Enhanced localization of genetic samples through linkage-disequilibrium correction

Affiliations

Enhanced localization of genetic samples through linkage-disequilibrium correction

Yael Baran et al. Am J Hum Genet. .

Abstract

Characterizing the spatial patterns of genetic diversity in human populations has a wide range of applications, from detecting genetic mutations associated with disease to inferring human history. Current approaches, including the widely used principal-component analysis, are not suited for the analysis of linked markers, and local and long-range linkage disequilibrium (LD) can dramatically reduce the accuracy of spatial localization when unaccounted for. To overcome this, we have introduced an approach that performs spatial localization of individuals on the basis of their genetic data and explicitly models LD among markers by using a multivariate normal distribution. By leveraging external reference panels, we derive closed-form solutions to the optimization procedure to achieve a computationally efficient method that can handle large data sets. We validate the method on empirical data from a large sample of European individuals from the POPRES data set, as well as on a large sample of individuals of Spanish ancestry. First, we show that by modeling LD, we achieve accuracy superior to that of existing methods. Importantly, whereas other methods show decreased performance when dense marker panels are used in the inference, our approach improves in accuracy as more markers become available. Second, we show that accurate localization of genetic data can be achieved with only a part of the genome, and this could potentially enable the spatial localization of admixed samples that have a fraction of their genome originating from a given continent. Finally, we demonstrate that our approach is resistant to distortions resulting from long-range LD regions; such distortions can dramatically bias the results when unaccounted for.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The Effect of Increasing the SNP Density on the Different Methods PCA, SPA, and LOCO-LD were run on the POPRES data set after different levels of LD pruning were applied to it. As the threshold increased, fewer SNPs were pruned, the number of SNPs increased, and the LD increased. The increasing threshold levels correspond to using 12%, 17%, 25%, 43%, 57%, and 97% of the available SNPs. The reported error is the median distance in km between the true and estimated locations over all samples in the data set.
Figure 2
Figure 2
The Effect of Decreasing the Available Amount of Genomic Sequence on the Different Methods PCA, SPA, and LOCO-LD were tested on genomic segments of different lengths, corresponding to different fractions of the genome. For PCA and SPA, the results with and without pruning the segments for both local and long-range LD are given. LOCO-LD’s version is haplotypic with window length 50. For each method and fraction of genome used, the plot gives the median error (in km) averaged over ten segments of the corresponding length for the samples in the data set. The error bars represent the uncertainty induced by the sampling of segments and give the SEM over the ten trials. The genomic fractions, given in the x axis, correspond to 100, 500, 1,000, 5,000, 10,000, 50,000, 100,000, 200,000, and 300,000 SNPs.
Figure 3
Figure 3
The Effect of a Long-Range LD Region Spanning an Inversion on Chromosome 8 on the Localization of a Spanish Data Set The samples of the Spanish data set were localized with PCA, SPA, and LOCO-LD. The colors and marker types, defined in Figure 4, give the samples’ communities of origin. (A, D, and G) The localization estimates (x versus y coordinates) of PCA (A), SPA (D), and LOCO-LD (G) on the entire Spanish data set. (B, E, and H) The results of PCA (B), SPA (E), and LOCO-LD (H) when only chromosome 8 was used. (C, F, and I) The results of PCA (C), SPA (F), and LOCO-LD (I) when only the inversion region was used.
Figure 4
Figure 4
LOCO-LD’s Localization Results for Northern Spain The figure depicts the inferred locations for individuals from different autonomous communities in the northern part of Spain. A description of the data set is given in Results section “Robustness to Long-Range LD: Results for a Spanish Data Set.” The number of training samples from each community is limited to 50. LOCO-LD’s version is genotypic with window length 10. The marker colors and types give the samples’ reported community of origin. The map at the top left depicts the true geographic locations of the communities. See Web Resources for background-image attribution.

References

    1. Price A.L., Zaitlen N.A., Reich D., Patterson N. New approaches to population stratification in genome-wide association studies. Nat. Rev. Genet. 2010;11:459–463. - PMC - PubMed
    1. Seldin M.F., Pasaniuc B., Price A.L. New approaches to disease mapping in admixed populations. Nat. Rev. Genet. 2011;12:523–528. - PMC - PubMed
    1. Price A.L., Patterson N.J., Plenge R.M., Weinblatt M.E., Shadick N.A., Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 2006;38:904–909. - PubMed
    1. Jarvis J.P., Scheinfeldt L.B., Soi S., Lambert C., Omberg L., Ferwerda B., Froment A., Bodo J.M., Beggs W., Hoffman G. Patterns of ancestry, signatures of natural selection, and genetic association with stature in Western African pygmies. PLoS Genet. 2012;8:e1002641. - PMC - PubMed
    1. Bryc K., Velez C., Karafet T., Moreno-Estrada A., Reynolds A., Auton A., Hammer M., Bustamante C.D., Ostrer H. Colloquium paper: genome-wide patterns of population structure and admixture among Hispanic/Latino populations. Proc. Natl. Acad. Sci. USA. 2010;107(Suppl 2):8954–8961. - PMC - PubMed

Publication types

Substances

LinkOut - more resources