Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Oct 2:8:e10090.
doi: 10.7717/peerj.10090. eCollection 2020.

Comparing local ancestry inference models in populations of two- and three-way admixture

Affiliations

Comparing local ancestry inference models in populations of two- and three-way admixture

Ryan Schubert et al. PeerJ. .

Abstract

Local ancestry estimation infers the regional ancestral origin of chromosomal segments in admixed populations using reference populations and a variety of statistical models. Integrating local ancestry into complex trait genetics has the potential to increase detection of genetic associations and improve genetic prediction models in understudied admixed populations, including African Americans and Hispanics. Five methods for local ancestry estimation that have been used in human complex trait genetics are LAMP-LD (2012), RFMix (2013), ELAI (2014), Loter (2018), and MOSAIC (2019). As users rather than developers, we sought to perform direct comparisons of accuracy, runtime, memory usage, and usability of these software tools to determine which is best for incorporation into association study pipelines. We find that in the majority of cases RFMix has the highest median accuracy with the ranking of the remaining software dependent on the ancestral architecture of the population tested. Additionally, we estimate the O(n) of both memory and runtime for each software and find that for both time and memory most software increase linearly with respect to sample size. The only exception is RFMix, which increases quadratically with respect to runtime and linearly with respect to memory. Effective local ancestry estimation tools are necessary to increase diversity and prevent population disparities in human genetics studies. RFMix performs the best across methods, however, depending on application, other methods perform just as well with the benefit of shorter runtimes. Scripts used to format data, run software, and estimate accuracy can be found at https://github.com/WheelerLab/LAI_benchmarking.

Keywords: Admixture; Benchmarking; Human genetics; Local ancestry; Population genetics.

PubMed Disclaimer

Conflict of interest statement

The authors declare there are no competing interests.

Figures

Figure 1
Figure 1. Process for simulating admixed individuals and estimating ancestries.
(1) From non-admixed populations from 1000G we randomly select 10% of all individuals to use as founders for admixture simulation. The rest are used as reference panels for ancestry estimation. (2) We generate admixed individuals that are chromosomal mosaics of the founder group using the admixture simulation tool created by the authors of RFMix (Maples et al., 2013). (3) Using the remaining 1000G populations as reference panels, we estimate ancestries on the simulated population for all five software tools and compare estimation accuracy.
Figure 2
Figure 2. Software runtime versus sample size.
We tested the runtime of each software on one core at a sample sizes of 20, 50, 100, 500, 1,000, 1,500, and 2,000. Points represented sample sizes tested versus runtime in hours, which are connected by line segments colored by software. We simulated n African American individuals from CEU and YRI “founder” populations with average admixture proportions of 20% and 80%, respectively. We find that the runtime of ELAI, LAMP-LD, MOSAIC, and Loter all increase linearly with the number of samples. The runtime of RFMix increases quadratically.
Figure 3
Figure 3. Software memory usage versus sample size.
We tested the maximum memory usage of each software on one core at a sample size of 20, 50, 100, 500, 1,000, 1,500, and 2,000. Points represented sample sizes tested versus memory in gigabytes (GB), which are connected by line segments colored by software. We simulated n African American individuals from CEU and YRI “founder” populations with average admixture proportions of 20% and 80%, respectively. We find that maximum memory usage for all software increases linearly with the number of samples.
Figure 4
Figure 4. Software memory usage versus number of ancestral populations.
Comparison between memory usage in megabytes (MB) of each software and the number of ancestries estimated in 100 individuals simulated per chromosome in each population. Memory usage differed by number of ancestral populations using ELAI and LAMP-LD. One-way ANOVA results for each software: (A) ELAI p = 3.78 × 10−4, (B) LAMP-LD p = 3.67 × 10−2, (C) Loter p = 0.637, (D) MOSAIC p = 0.65, (E) RFMix p = 0.204.
Figure 5
Figure 5. Software runtime versus number of ancestral populations.
Comparison between runtime in seconds (s) of each software and the number of ancestries estimated in 100 individuals simulated per chromosome in each population. Runtimes differed by number of ancestral populations using ELAI and RFMix. One-way ANOVA results for each software: (A) ELAI p = 5.98 × 10−3, (B) LAMP-LD p = 0.906, (C) Loter p = 0.985, (D) MOSAIC p = 0.555, (E) RFMix p = 2.01 × 10−3.
Figure 6
Figure 6. Distribution of accuracy for ancestry estimation.
For each category of admixture, (A) 3WAY, (B) AFA, (C) HIS, we estimate ancestries on 100 simulated individuals. Accuracy is then calculated as the true positive call rate for the estimated ancestries per each software. True positive call rate (TPCR) is defined as the proportion of loci in each individual with an ancestry correctly estimated by a given software. RFMix had the highest TPCR across most pairwise comparisons. Tukey’s test results are presented in Figs. S4–S6.
Figure 7
Figure 7. Estimated African ancestry proportion in the ASW population (African Ancestry in Southwest US) is correlated with the first principal component.
We plot the the mean local ancestry proportion of African ancestries estimated by each software against the first principal component of genotypes, a known estimate of global African ancestry, to validate the robustness of local ancestry estimates. The local ancestry estimate was highly correlated with PC1 for all software tools (R2 > 0.96), with no significant difference between tools (p > 0.62).

References

    1. Auton A, Abecasis GR, Altshuler DM, Durbin RM, Bentley DR, Chakravarti A, Clark AG, Donnelly P, Eichler EE, Flicek P, Gabriel SB, Gibbs RA, Green ED, Hurles ME, Knoppers BM, Korbel JO, Lander ES, Lee C, Lehrach H, Mardis ER, Marth GT, McVean GA, Nickerson DA, Schmidt JP, Sherry ST, Wang J, Wilson RK, Boerwinkle E, Doddapaneni H, Han Y, Korchina V, Kovar C, Lee S, Muzny D, Reid JG, Zhu Y, Chang Y, Feng Q, Fang X, Guo X, Jian M, Jiang H, Jin X, Lan T, Li G, Li J, Li YY, Liu S, Liu X, Lu Y, Ma X, Tang M, Wang B, Wang G, Wu H, Wu R, Xu X, Yin Y, Zhang D, Zhang W, Zhao J, Zhao M, Zheng X, Gupta N, Gharani N, Toji LH, Gerry NP, Resch AM, Barker J, Clarke L, Gil L, Hunt SE, Kelman G, Kulesha E, Leinonen R, McLaren WM, Radhakrishnan R, Roa A, Smirnov D, Smith RE, Streeter I, Thormann A, Toneva I, Vaughan B, Zheng-Bradley X, Grocock R, Humphray S, James T, Kingsbury Z, Sudbrak R, Albrecht MW, Amstislavskiy VS, Borodina TA, Lienhard M, Mertes F, Sultan M, Timmermann B, Yaspo ML, Fulton L, Ananiev V, Belaia Z, Beloslyudtsev D, Bouk N, Chen C, Church D, Cohen R, Cook C, Garner J, Hefferon T, Kimelman M, Liu C, Lopez J, Meric P, O’Sullivan C, Ostapchuk Y, Phan L, Ponomarov S, Schneider V, Shekhtman E, Sirotkin K, Slotta D, Zhang H, Balasubramaniam S, Burton J, Danecek P, Keane TM, Kolb-Kokocinski A, McCarthy S, Stalker J, Quail M, Davies CJ, Gollub J, Webster T, Wong B, Zhan Y, Campbell CL, Kong Y, Marcketta A, Yu F, Antunes L, Bainbridge M, Sabo A, Huang Z, Coin L. JM, Fang L, Li Q, Li Z, Lin H, Liu B, Luo R, Shao H, Xie Y, Ye C, Yu C, Zhang F, Zheng H, Zhu H, Alkan C, Dal E, Kahveci F, Garrison EP, Kural D, Lee WP, Leong WF, Stromberg M, Ward AN, Wu J, Zhang M, Daly MJ, DePristo MA, Handsaker RE, Banks E, Bhatia G, Del Angel G, Genovese G, Li H, Kashin S, McCarroll SA, Nemesh JC, Poplin RE, Yoon SC, Lihm J, Makarov V, Gottipati S, Keinan A, Rodriguez-Flores JL, Rausch T, Fritz MH, Stütz AM, Beal K, Datta A, Herrero J, Ritchie G. RS, Zerbino D, Sabeti PC, Shlyakhter I, Schaffner SF, Vitti J, Cooper DN, Ball EV, Stenson PD, Barnes B, Bauer M, Cheetham RK, Cox A, Eberle M, Kahn S, Murray L, Peden J, Shaw R, Kenny EE, Batzer MA, Konkel MK, Walker JA, MacArthur DG, Lek M, Herwig R, Ding L, Koboldt DC, Larson D, Ye KK, Gravel S, Swaroop A, Chew E, Lappalainen T, Erlich Y, Gymrek M, Willems TF, Simpson JT, Shriver MD, Rosenfeld JA, Bustamante CD, Montgomery SB, De La Vega FM, Byrnes JK, Carroll AW, DeGorter MK, Lacroute P, Maples BK, Martin AR, Moreno-Estrada A, Shringarpure SS, Zakharia F, Halperin E, Baran Y, Cerveira E, Hwang J, Malhotra A, Plewczynski D, Radew K, Romanovitch M, Zhang C, Hyland FCL, Craig DW, Christoforides A, Homer N, Izatt T, Kurdoglu AA, Sinari SA, Squire K, Xiao C, Sebat J, Antaki D, Gujral M, Noor A, Ye KK, Burchard EG, Hernandez RD, Gignoux CR, Haussler D, Katzman SJ, Kent WJ, Howie B, Ruiz-Linares A, Dermitzakis ET, Devine SE, Kang HM, Kidd JM, Blackwell T, Caron S, Chen W, Emery S, Fritsche L, Fuchsberger C, Jun G, Li B, Lyons R, Scheller C, Sidore C, Song S, Sliwerska E, Taliun D, Tan A, Welch R, Wing MK, Zhan X, Awadalla P, Hodgkinson A, Li YY, Shi X, Quitadamo A, Lunter G, Marchini JL, Myers S, Churchhouse C, Delaneau O, Gupta-Hinch A, Kretzschmar W, Iqbal Z, Mathieson I, Menelaou A, Rimmer A, Xifara DK, Oleksyk TK, Fu YY, Liu X, Xiong M, Jorde L, Witherspoon D, Xing J, Browning BL, Browning SR, Hormozdiari F, Sudmant PH, Khurana E, Tyler-Smith C, Albers CA, Ayub Q, Chen Y, Colonna V, Jostins L, Walter K, Xue Y, Gerstein MB, Abyzov A, Balasubramanian S, Chen J, Clarke D, Fu YY, Harmanci AO, Jin M, Lee D, Liu J, Mu XJ, Zhang J, Zhang YY, Hartl C, Shakir K, Degenhardt J, Meiers S, Raeder B, Casale FP, Stegle O, Lameijer EW, Hall I, Bafna V, Michaelson J, Gardner EJ, Mills RE, Dayama G, Chen K, Fan X, Chong Z, Chen T, Chaisson MJ, Huddleston J, Malig M, Nelson BJ, Parrish NF, Blackburne B, Lindsay SJ, Ning Z, Zhang YY, Lam H, Sisu C, Challis D, Evani US, Lu J, Nagaswamy U, Yu J, Li W, Habegger L, Yu H, Cunningham F, Dunham I, Lage K, Jespersen JB, Horn H, Kim D, Desalle R, Narechania A, Sayres MA, Mendez FL, Poznik GD, Underhill PA, Mittelman D, Banerjee R, Cerezo M, Fitzgerald TW, Louzada S, Massaia A, Yang F, Kalra D, Hale W, Dan X, Barnes KC, Beiswanger C, Cai H, Cao H, Henn B, Jones D, Kaye JS, Kent A, Kerasidou A, Mathias RA, Ossorio PN, Parker M, Rotimi CN, Royal CD, Sandoval K, Su Y, Tian Z, Tishkoff S, Via M, Wang Y, Yang H, Yang L, Zhu J, Bodmer W, Bedoya G, Cai Z, Gao Y, Chu J, Peltonen L, Garcia-Montero A, Orfao A, Dutil J, Martinez-Cruzado JC, Mathias RA, Hennis A, Watson H, McKenzie C, Qadri F, LaRocque R, Deng X, Asogun D, Folarin O, Happi C, Omoniwa O, Stremlau M, Tariyal R, Jallow M, Joof FS, Corrah T, Rockett K, Kwiatkowski D, Kooner J, Hien TT, Dunstan SJ, ThuyHang N, Fonnie R, Garry R, Kanneh L, Moses L, Schieffelin J, Grant DS, Gallo C, Poletti G, Saleheen D, Rasheed A, Brooks LD, Felsenfeld AL, McEwen JE, Vaydylevich Y, Duncanson A, Dunn M, Schloss JA. A global reference for human genetic variation. Nature. 2015;526(7571):68–74. doi: 10.1038/nature15393. - DOI - PMC - PubMed
    1. Baran Y, Pasaniuc B, Sankararaman S, Torgerson DG, Gignoux C, Eng C, Rodriguez-Cintron W, Chapela R, Ford JG, Avila PC, Rodriguez-Santana J, Burchard EG, Halperin E. Fast and accurate inference of local ancestry in Latino populations. Bioinformatics. 2012;28(10):1359–1367. doi: 10.1093/bioinformatics/bts144. - DOI - PMC - PubMed
    1. Bryc K, Durand EY, Macpherson JM, Reich D, Mountain JL. The genetic ancestry of African Americans, Latinos, and European Americans across the United States. American Journal of Human Genetics. 2015;96(1):37–53. doi: 10.1016/j.ajhg.2014.11.010. - DOI - PMC - PubMed
    1. Chang CC, Chow CC, Tellier LCAM, Vattikuti S, Purcell SM, J LJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience. 2015;4:7. doi: 10.1186/s13742-015-0047-8. - DOI - PMC - PubMed
    1. Dias-Alves T, Mairal J, Blum MBB. Loter: a software package to infer local ancestry for a wide range of species. Molecular Biology and Evolution. 2018;35(9):2318–2326. doi: 10.1093/molbev/msy126. - DOI - PMC - PubMed