Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr 30;17(5):evaf080.
doi: 10.1093/gbe/evaf080.

The Precision and Power of Population Branch Statistics in Identifying the Genomic Signatures of Local Adaptation

Affiliations

The Precision and Power of Population Branch Statistics in Identifying the Genomic Signatures of Local Adaptation

Max Shpak et al. Genome Biol Evol. .

Abstract

Population branch statistics, which estimate the degree of genetic differentiation along a focal population's lineage, have been used as an alternative to FST-based genome-wide scans for identifying loci associated with local selective sweeps. Beyond the population branch statistic (PBS), the normalized PBSn1 adjusts focal branch length with respect to outgroup branch lengths at the same locus, whereas population branch excess (PBE) incorporates median branch lengths at other loci. PBSn1 and PBE were proposed to be more specific to local selective sweeps as opposed to geographically ubiquitous selection. However, the accuracy and statistical power of branch statistics have not been systematically assessed. To do so, we simulate genomes in representative large and small populations with varying proportions of sites evolving under genetic drift or (approximated) background selection, with local selective sweeps or geographically parallel selective sweeps. We then assess the probability that local selective sweep loci are correctly identified as outliers by FST and by each of the branch statistics. We find that branch statistics consistently outperform FST at identifying local sweeps. Particularly when parallel sweeps are introduced, PBSn1 and PBE correctly identify local sweeps among their top outliers more frequently than PBS. Additionally, we evaluate versions of these statistics based on maximal site differentiation within a window, finding that site-based PBE and PBSn1 are particularly effective at identifying local soft sweeps. These results validate the greater specificity of the rescaled branch statistics PBE and PBSn1 to detect population-specific positive selection, supporting their use in genomic studies focused on local adaptation.

Keywords: fixation index; local adaptation; population branch statistics; population genetic simulation; selective sweeps.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Illustration of the demographic models and genomic parameters implemented in population genetic simulations. a) Unrooted three-population tree, with A representing the the focal population. T values represent the genetic distances between each pair of populations, based on a log transformation of FST (Eq. (1)). PBS, as the estimated length of the focal population branch, is then an intuitive function of these T values. b) The simulated three-population genealogies. Ni represents the size of the ith small population, the value in parentheses is the size of the corresponding simulated large population (the lineage D is the ancestral population for A,B before their split, Nanc is the size of the population ancestral to all three sampled populations). t1 and t2 are the split times for the inner and outer divergences, respectively. The left genealogy was used for the migration-free simulations as well as the first set of migration simulations (a net population migration rate of Nem = 1 is represented by the dashed lines). The right genealogy has branch lengths adjusted to generate the same pairwise FST as in the first genealogy without migration under genetic drift alone. c) The genomic parameters used in simulations for the small and large populations (values in parentheses are the 50× rescalings used in the simulations). Mutation, crossing over, and gene-conversion rates are per-site, per-generation.
Fig. 2.
Fig. 2.
Simulated genome models used in this study. a) Schematics representing the model genomes, each containing 25,000 simulated loci (representing genomic windows in an empirical scan). The upper left genome has 99% of loci randomly sampled from the 106 genetic drift simulations and 1% from the 104 local selective sweep simulations. The upper right genome has 98% neutral, 1% local selective sweep simulations, and 1% parallel sweep simulations. The lower left genome has 99% BGS loci and 1% local sweeps. The lower right genome has 98% BGS loci, 1% local sweeps, and 1% parallel sweep loci. b) The autosomal B-value distributions for human (left) and D. melanogaster (right) genomes are shown, as used here for the simulations of small and large population simulations, respectively. The vertical red line represents the truncation at B = 0.41 for each population size scenario, in order to approximate genome scans in which low recombination regions are excluded due to the difficulty in localizing targets.
Fig. 3.
Fig. 3.
Population branch statistics show greater precision to detect local selective sweeps than FST when other loci experience global positive or negative selection. Bar plots show the precision of FST and the population statistics PBS, PBSn1, PBE with respect to local sweeps, i.e. the fraction of local sweep loci among those contributing to the upper 1% quantile of each statistic. The top four panels show the results for a single selective sweep in the focal population for large and small populations (upper and lower panels), while the lower four panels do the same for parallel sweeps in all three populations. Each set of graphs includes genomic backgrounds with genetic drift (left panels) and with emulated BGS (right panels). Each panel shows results for all modeled selection regimes (hard complete, partial, soft complete sweeps), including migration scenarios involving hard sweeps that might have fixed if not for gene flow. Supplementary table S1, Supplementary Material online provides exact precision values for each scenario.
Fig. 4.
Fig. 4.
PBE and PBSn1 are less likely than other statistics to register parallel sweep loci among their top outliers. For model genome scenarios in which 1% of loci are subject to parallel sweeps in all populations, the graphs show the fraction of parallel sweep loci that contribute to the upper 1% quantile of FST and branch statistic distributions (i.e. the fraction of parallel sweep loci that are false positives) under different demographic and selection parameters.
Fig. 5.
Fig. 5.
Unconstrained maximum site PBSn1 and PBE have elevated precision for identifying local soft sweeps. When other loci evolve neutrally or under BGS, the site-level metrics FST_MaxSNP, PBSMaxSNP, PBSn1MaxSNP, and PBEMaxSNP outperform full window metrics in detecting local soft sweeps, and have at least comparable precision for local hard sweeps. When some fraction of loci evolve under local sweeps and others under parallel sweeps, the precision of FST_MaxSNP and PBSMaxSNP do not exceed 0.5 (indicating inability to distinguish these scenarios), while the site-unconstrained scaling of PBSn1MaxSNP and PBEMaxSNP allows them to effectively distinguish these models, resulting in precision values near or above 0.9. Note that precision values for the site maximum statistics are the same for hard and soft sweeps (since they achieve their same maximal values in either case), unlike the window-based statistics.
Fig. 6.
Fig. 6.
Precision of window and SNP-focused statistics under a complex human demographic model generally recapitulates patterns observed from simpler histories. Here, the demographic estimates of Gutenkunst et al. (2009) were simulated. Two of the more challenging-to-detect sweep models were chosen for this analysis: a soft sweep starting from 1% frequency (left panels) and a partial (hard) sweep ending at 50% frequency (right panels). Local sweeps in the European population were simulated, with other parallel sweep loci either absent (top panels) or present (bottom panels). Approximated background selelection via Ne reduction was also simulated in all of these cases. Results are shown both for whole window statistics (blue bars) and for SNP-level statistics (gray bars). Some previously observed patterns hold under this history as well, including the consistent advantage of population branch statistics over FST. Whereas in the presence of parallel sweeps, window and SNP-level PBSn1 metrics show greater precision than PBE, whereas in the above analyses, the reverse was more often true.

Update of

References

    1. Adrion JR, Cole CB, Dukler N, Galloway JG, Gladstein AL, Gower G, Kyriazis CC, Ragsdale AP, Tsambos G, Baumdicker F, et al. A community-maintained standard library of population genetic models. Elife. 2020:9:e54967. 10.7554/eLife.54967. - DOI - PMC - PubMed
    1. Adrion JR, Hahn MW, Cooper BS. Revisiting classic clines in Drosophila melanogaster in the age of genomics. Trends Genet. 2015:31(8):P434–P444. 10.1016/j.tig.2015.05.006. - DOI - PMC - PubMed
    1. Akey JM. Constructing genomic maps of positive selection in humans: where do we go from here? Genome Res. 2009:19(5):711–722. 10.1101/gr.086652.108. - DOI - PMC - PubMed
    1. Amato R, Pinelli M, Monticelli A, Marino D, Miele G, Cocozza S. Genome-wide scan for signatures of human population differentiation and their relationship with natural selection, functional pathways and diseases. PLoS One. 2009:4(11):e7927. 10.1371/journal.pone.0007927. - DOI - PMC - PubMed
    1. Antonovics J, Bradshaw AD. Evolution in closely adjacent plant populations. VIII. Clincal patterns in Anthoxanthum odoratum across a mine boundary. Heredity (Edinb). 1970:25(3):249–362. 10.1038/hdy.1970.36. - DOI

LinkOut - more resources