Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Aug 6;15(8):jkaf122.
doi: 10.1093/g3journal/jkaf122.

Phase-free local ancestry inference mitigates the impact of switch errors on phase-based methods

Affiliations

Phase-free local ancestry inference mitigates the impact of switch errors on phase-based methods

Siddharth Avadhanam et al. G3 (Bethesda). .

Abstract

Local ancestry inference is an indispensable component of a variety of analyses in medical and population genetics, from admixture mapping to characterizing demographic history. However, the accuracy of local ancestry inference depends on a number of factors such as phase quality (for phase-based local ancestry inference methods) and time since admixture. Here, we present an empirical analysis of four local ancestry inference methods using simulated individuals of mixed African and European ancestry, examining the impact of variable phase quality and a range of demographic scenarios. We find that regardless of phasing options, calls from local ancestry inference methods that operate on unphased genotypes (phase-free local ancestry inference) have 2.6-4.6% higher Pearson correlation with the ground truth than methods that operate on phased genotypes (phase-based local ancestry inference). Applying the TRACTOR phase correction algorithm led to modest improvements in phase-based local ancestry inference, but despite this, the Pearson correlation of phase-free local ancestry inference remains 2.4-3.8% higher than phase-corrected phase-based approaches (considering the best-performing methods in each category). Further, analyzing perfectly phased data yields accuracies for the phase-based local ancestry inference methods that are only slightly inferior to those of HAPMIX. Phase-free and phase-based local ancestry inference accuracy differences can dramatically impact downstream analyses: estimates of the time since admixture using phase-based local ancestry inference tracts are upwardly biased by ≈10 generations using our highest quality statistically phased data but have virtually no bias using phase-free local ancestry inference calls. This study underscores the strong dependence of phase-based local ancestry inference accuracy on phase quality and highlights the merits of local ancestry inference approaches that analyze unphased genetic data.

Keywords: admixture; ancestry; haplotypes; local ancestry inference; phase.

PubMed Disclaimer

Conflict of interest statement

Conflicts of interest: A.L.W. holds stock in 23andMe, Inc. and is the owner of HAPI-DNA LLC. S.A. declares no competing interests.

Figures

Fig. 1.
Fig. 1.
Flowchart depicting our simulation and phasing pipeline. In step a), we simulated two-way admixed individuals under different settings of T and p; in step b), we pooled the simulated data from a) with small, medium, and large sample sizes from the PAGE data; and in step c), we applied three different panel phasing strategies (see Varying simulation phase quality and panel phasing strategies in Materials and methods). Finally in step d), we inferred local ancestry on the output from c) and compared the resulting inferred calls with the ground truth using Pearson correlation.
Fig. 2.
Fig. 2.
Switch error rates depend on phasing sample size and several simulation parameters. Switch error rates plotted against a) proportion of African ancestry p, b) panel phasing strategy, c) phasing sample size, and d) admixture time T (i.e. generations since admixture). Each panel includes data points for all values of the other variables. Boxplot lengths represent the inter-quartile range (IQR) and whiskers extend up to 1.5×IQR.
Fig. 3.
Fig. 3.
Performance of LAI methods across all simulation parameters. Correlations R between inferred and true local ancestry assignments for all LAI methods plotted against a) proportion of African ancestry p, b) panel phasing strategy, c) phasing sample size, and d) admixture time T (i.e. generations since admixture). Each panel includes data points from all other simulation variables. Boxplot lengths represent the IQR and whiskers extend up to 1.5×IQR.
Fig. 4.
Fig. 4.
Performance of LAI methods for larger values of T using the reference phasing strategy. Each panel shows boxplots of correlations (R) between inferred and true local ancestry assignments across different simulation settings of a) proportion of African ancestry p, b) phasing sample size, and c) admixture time T (i.e. generations since admixture). Boxplot lengths represent the IQR and whiskers extend up to 1.5×IQR.
Fig. 5.
Fig. 5.
HAPMIX outperforms phase-based LAI even when perfectly phased data is available. Plot shows correlations R between the inferred and true local ancestry assignments by phasing sample size, including results for FLARE and RFMix when analyzing perfectly phased data (as produced by the simulation pipeline). Each panel includes data points from all other simulation variables (with T{5,6,7}). Boxplot lengths represent the IQR and whiskers extend up to 1.5×IQR.
Fig. 6.
Fig. 6.
Deviations of admixture time estimates using local ancestry calls from HAPMIX and FLARE. Plots depict histograms of the deviations of admixture time estimates for individual samples using a phase-free method (applying PAPI to the output of HAPMIX) and a phase-based method (applying Equation (3) to the output of FLARE) for small (top), medium (middle), and large (bottom) phasing sample sizes. Data points are pooled across all other simulation variables.

Update of

Similar articles

References

    1. Alexander DH, Novembre J, Lange K. 2009. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19(9):1655–1664. 10.1101/gr.094052.109 - DOI - PMC - PubMed
    1. Atkinson EG, Maihofer AX, Kanai M, Martin AR, Karczewski KJ, Santoro ML, Ulirsch JC, Kamatani Y, Okada Y, Finucane HK, et al. 2021. Tractor uses local ancestry to enable the inclusion of admixed individuals in GWAS and to boost power. Nat Genet. 53(2):195–204. 10.1038/s41588-020-00766-y - DOI - PMC - PubMed
    1. 1000 Genomes Project Consortium, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA, et al. 2015. A global reference for human genetic variation. Nature. 526(7571):68–74. 10.1038/nature15393 - DOI - PMC - PubMed
    1. Avadhanam S, Williams AL. 2022. Simultaneous inference of parental admixture proportions and admixture times from unphased local ancestry calls. Am J Hum Genet. 109(8):1405–1420. 10.1016/j.ajhg.2022.06.016 - DOI - PMC - PubMed
    1. Baran Y, Pasaniuc B, Sankararaman S, Torgerson DG, Gignoux C, Eng C, Rodriguez-Cintron W, Chapela R, Ford JG, Avila PC, et al. 2012. Fast and accurate inference of local ancestry in Latino populations. Bioinformatics. 28(10):1359–1367. 10.1093/bioinformatics/bts144 - DOI - PMC - PubMed

LinkOut - more resources