Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2018 Sep 1;35(9):2318-2326.
doi: 10.1093/molbev/msy126.

Loter: A Software Package to Infer Local Ancestry for a Wide Range of Species

Affiliations
Comparative Study

Loter: A Software Package to Infer Local Ancestry for a Wide Range of Species

Thomas Dias-Alves et al. Mol Biol Evol. .

Abstract

Admixture between populations provides opportunity to study biological adaptation and phenotypic variation. Admixture studies rely on local ancestry inference for admixed individuals, which consists of computing at each locus the number of copies that originate from ancestral source populations. Existing software packages for local ancestry inference are tuned to provide accurate results on human data and recent admixture events. Here, we introduce Loter, an open-source software package that does not require any biological parameter besides haplotype data in order to make local ancestry inference available for a wide range of species. Using simulations, we compare the performance of Loter to HAPMIX, LAMP-LD, and RFMix. HAPMIX is the only software severely impacted by imperfect haplotype reconstruction. Loter is the less impacted software by increasing admixture time when considering simulated and admixed human genotypes. For simulations of admixed Populus genotypes, Loter and LAMP-LD are robust to increasing admixture times by contrast to RFMix. When comparing length of reconstructed and true ancestry tracts, Loter and LAMP-LD provide results whose accuracy is again more robust than RFMix to increasing admixture times. We apply Loter to individuals resulting from admixture between Populus trichocarpa and Populus balsamifera and lengths of ancestry tracts indicate that admixture took place ∼100 generations ago. We expect that providing a rapid and parameter-free software for local ancestry inference will make more accessible genomic studies about admixture processes.

Keywords: admixture; local ancestry; optimization; population genetics.

PubMed Disclaimer

Figures

<sc>Fig</sc>. 1.
Fig. 1.
Example of local ancestry inference for four simulated Populus individuals resulting from admixture between two Populus species, which are Populus trichocarpa and Populus balsamifera (Suarez-Gonzalez et al. 2016). For an admixed individual, local ancestry at a given locus corresponds to the number of copies that has been inherited from the species P. trichocarpa. LAI software require haplotypes from putative source populations and process haplotypes or genotypes from admixed population to return local ancestry of admixed individuals. Details of the simulations are described in the Materials and Methods section.
<sc>Fig</sc>. 2.
Fig. 2.
Graphical description of Local Ancestry Inference as implemented in the software Loter. Given a collection of parental haplotypes from the source populations depicted in blue and red, Loter assumes that an haplotype of an admixed individuals is modeled as a mosaic of existing parental haplotypes. In this example, the first term of equation (1) (loss function) is equal to 1 because of a single mismatch between parental and admixed haplotype located at the next-to-last position, and the second term of equation (1) (regularization term) is equal to 2λ because there are two switches between parental haplotypes. The displayed solution corresponds to the mathematical solution (s1,,s11)=(5,5,5,5,1,1,1,1,2,2,2) where haplotypes are numbered from top to bottom, and sj = k if the admixed haplotype results from a copy of the kth parental haplotype at the jth SNP.
<sc>Fig</sc>. 3.
Fig. 3.
Graph that represents the optimization problem of equation (1). An optimal solution for (s1,,sp) is found by finding the shortest path from node a to node b. We assume that there are n individuals in the source populations resulting in 2n haplotypes denoted by (H1,,H2n). The value (0 or 1) of the ith haplotype at the jth SNP is denoted by Hij. A vector (s1,,sp) describes the sequence of haplotype labels from which the haplotype h of an admixed individual can be approximated. For the jth SNP in the data set, sj = k if haplotype h results from a copy of haplotype Hk.
<sc>Fig</sc>. 4.
Fig. 4.
Diploid accuracy obtained with LAMP-LD, Loter, and RFMix for simulated admixed human individuals as a function of the time since admixture occurred. Admixed individuals are simulated by constructing their genomes from a mosaic of true African (YRI) and European (CEU) haplotypes (International HapMap 3 Consortium 2010). For performing simulations, true haplotypes are obtained using trio information. For local ancestry inference, haplotypes are obtained with Beagle using individuals that are not used for simulating admixed individuals. For each value of the number of generations since admixture, 20 sets of 48 admixed individuals are generated. Boxplots show the distribution of the 20 values for the mean diploid accuracy.
<sc>Fig</sc>. 5.
Fig. 5.
Diploid accuracy obtained with LAMP-LD, Loter, and RFMix for simulated admixed Populus individuals as a function of the time since admixture occurred. Admixed individuals are simulated by constructing their genomes from a mosaic of Populus trichocarpa and Populus balsamifera individuals. Individuals are phased using Beagle and two different sets of individuals are used for performing simulations and inference. For each value of the number of generations since admixture, 20 sets of 20 admixed individuals are generated. Boxplots show the distribution of the 20 values for the mean diploid accuracy.
<sc>Fig</sc>. 6.
Fig. 6.
Distribution of the length of ancestry chunks for simulated data. For Populus data, we consider the first 500, 000 SNPs of chromosome 6 and for human data, we consider the first 50, 000 SNPs of chromosome 1. When considering Populus data, we run 10 times LAMP-LD on nonoverlapping sets of SNPs in order to avoid the limitation of 50, 000 SNPs of LAMP-LD.
<sc>Fig</sc>. 7.
Fig. 7.
Distribution of the length of Populus balsamifera ancestry tracts. The data consist of genotypes of admixed individuals between P. balsamifera and P. trichocarpa. For the simulations, we replicate the same pipeline as for local inference with real data, which consist of using Beagle to phase genotypes and Loter to reconstruct ancestry tracts.

References

    1. Baran Y, Pasaniuc B, Sankararaman S, Torgerson DG, Gignoux C, Eng C, Rodriguez-Cintron W, Chapela R, Ford JG, Avila PC.. 2012. Fast and accurate inference of local ancestry in Latino populations. Bioinformatics 2810:1359–1367. - PMC - PubMed
    1. Bhatia G, Patterson N, Sankararaman S, Price AL.. 2013. Estimating and interpreting FST: the impact of rare variants. Genome Res. 239:1514–1521. - PMC - PubMed
    1. Brandvain Y, Kenney AM, Flagel L, Coop G, Sweigart AL.. 2014. Speciation and Introgression between Mimulus nasutus and Mimulus guttatus. PLoS Genet. 106:e1004410.. - PMC - PubMed
    1. Breiman L. 1996. Bagging predictors. Mach Learn. 242:123–140.
    1. Browning SR, Browning BL.. 2007. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet. 815:1084–1097. - PMC - PubMed