Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Sep 25;22(1):459.
doi: 10.1186/s12859-021-04350-x.

Ancestry inference using reference labeled clusters of haplotypes

Affiliations

Ancestry inference using reference labeled clusters of haplotypes

Yong Wang et al. BMC Bioinformatics. .

Abstract

Background: We present ARCHes, a fast and accurate haplotype-based approach for inferring an individual's ancestry composition. Our approach works by modeling haplotype diversity from a large, admixed cohort of hundreds of thousands, then annotating those models with population information from reference panels of known ancestry.

Results: The running time of ARCHes does not depend on the size of a reference panel because training and testing are separate processes, and the inferred population-annotated haplotype models can be written to disk and reused to label large test sets in parallel (in our experiments, it averages less than one minute to assign ancestry from 32 populations using 10 CPU). We test ARCHes on public data from the 1000 Genomes Project and the Human Genome Diversity Project (HGDP) as well as simulated examples of known admixture.

Conclusions: Our results demonstrate that ARCHes outperforms RFMix at correctly assigning both global and local ancestry at finer population scales regardless of the amount of population admixture.

Keywords: ARCHes; Ancestry inference; HMM; Haplotype modeling; Local ancestry; RFMix.

PubMed Disclaimer

Conflict of interest statement

The authors declare competing financial interests: authors affiliated with AncestryDNA may have equity in Ancestry. The work described in this manuscript is covered by one or more patents including US patent entitled Local Genetic Ethnicity Determination System US10558930B2.

Figures

Fig. 1
Fig. 1
Boxplot of the estimated ancestry proportions for single-origin individuals from each testing population comparing ARCHes and RFMix
Fig. 2
Fig. 2
Precision/Recall for each population calculated from estimated ancestry proportions of simulated admixed individuals with ancestry from a pair of neighboring population
Fig. 3
Fig. 3
Illustration of annotating haplotype-cluster model representing one genomic window with D SNPs (in our experiments D is about 75–80, about 3-4 cM). Each box illustrates the expected proportion of haplotypes in all the genotypes of different populations that include a certain model state at a certain level
Fig. 4
Fig. 4
Illustration of genome wide HMM where each window has a series of emitting states, which corresponds to a population assignment (p,q) with 1 ≤ p ≤ q ≤ K

References

    1. Loh P-R, Lipson M, Patterson N, Moorjani P, Pickrell JK, Reich D, Berger B. Inferring admixture histories of human populations using linkage disequilibrium. Genetics. 2013;193:1233–1254. doi: 10.1534/genetics.112.147330. - DOI - PMC - PubMed
    1. Gravel S. Population genetics models of local ancestry. Genetics. 2012;191:607–619. doi: 10.1534/genetics.112.139808. - DOI - PMC - PubMed
    1. Marchini J, Cardon LR, Phillips MS, Donnelly P. The effects of human population structure on large genetic association studies. Nat Genet. 2004;36:512–517. doi: 10.1038/ng1337. - DOI - PubMed
    1. Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19:1655–1664. doi: 10.1101/gr.094052.109. - DOI - PMC - PubMed
    1. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006;38:904–909. doi: 10.1038/ng1847. - DOI - PubMed

LinkOut - more resources