Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 May 17:2024.05.14.594139.
doi: 10.1101/2024.05.14.594139.

The Precision and Power of Population Branch Statistics in Identifying the Genomic Signatures of Local Adaptation

Affiliations

The Precision and Power of Population Branch Statistics in Identifying the Genomic Signatures of Local Adaptation

Max Shpak et al. bioRxiv. .

Update in

Abstract

Population branch statistics, which estimate the branch lengths of focal populations with respect to two outgroups, have been used as an alternative to FST-based genome-wide scans for identifying loci associated with local selective sweeps. In addition to the original population branch statistic (PBS), there are subsequently proposed branch rescalings: normalized population branch statistic (PBSn1), which adjusts focal branch length with respect to outgroup branch lengths at the same locus, and population branch excess (PBE), which also incorporates median branch lengths at other loci. PBSn1 and PBE have been proposed to be less sensitive to allele frequency divergence generated by background selection or geographically ubiquitous positive selection rather than local selective sweeps. However, the accuracy and statistical power of branch statistics have not been systematically assessed. To do so, we simulate genomes in representative large and small populations with varying proportions of sites evolving under genetic drift or background selection (approximated using variable N e ), local selective sweeps, and geographically parallel selective sweeps. We then assess the probability that local selective sweep loci are correctly identified as outliers by FST and by each of the branch statistics. We find that branch statistics consistently outperform FST at identifying local sweeps. When background selection and/or parallel sweeps are introduced, PBSn1 and especially PBE correctly identify local sweeps among their top outliers at a higher frequency than PBS. These results validate the greater specificity of rescaled branch statistics such as PBE to detect population-specific positive selection, supporting their use in genomic studies focused on local adaptation.

Keywords: Population branch statistics; fixation index; local adaptation; population genetic simulation; selective sweeps.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.. Illustration of the demographic models and genomic parameters implemented in population genetic simulations.
(A) Unrooted three population tree, with A representing the the focal population. T values represent the genetic distances between each pair of populations, based on a log transformation of FST (Eq. 1). PBS, as the estimated length of the focal population branch, is then an intuitive function of these T values. (B) The simulated three-population genealogies. Ni represents the size of the ith small population, the value in parentheses is the size of the corresponding simulated large population (the lineage D is the ancestral population for A,B before their split, Nanc is the size of the population ancestral to all three sampled populations). t1 and t2 are the split times for the inner and outer divergences, respectively. The left genealogy was used for the migration-free simulations as well as the first set of migration simulations (a net population migration rate of Nem = 1 is represented by the dashed lines). The right genealogy has branch lengths adjusted to generate the same pairwise FST as in the first genealogy without migration under genetic drift alone. (C) The genomic parameters used in simulations for the small and large populations (values in parentheses are the 50x rescalings used in the simulations).
Figure 2.
Figure 2.. Simulated genome models used in this study.
(A) Schematics representing the model genomes each containing 25,000 simulated loci (representing genomic windows in an empirical scan). The upper left genome has 99% of loci randomly sampled from the 106 genetic drift simulations and 1% from the 104 local selective sweep simulations. The upper right genome has 98% neutral, 1% local selective sweep simulations, and 1% parallel sweep simulations. The lower left genome has 99% BGS loci and 1% local sweeps. The lower right genome has 98% BGS loci, 1% local sweeps, and 1% parallel sweep loci. (B) The autosomal B-value distributions for human (left) and D. melanogaster (right) genomes are shown, as used here for the simulations of small and large population simulations, respectively. The vertical red line rep resents the truncation at B = 0.41for each population size scenario, in order to approximate genome scans in which low recombination regions are excluded due to the difficulty in localizing targets.
Figure 3.
Figure 3.. Population branch statistics show greater precision to detect local selective sweeps than FST when other loci experience global positive or negative selection.
Heat maps show the precision of FST and the population statistics PBS, PBSn1, PBE with respect to local sweeps, i.e. the fraction of local sweep loci among those contributing to the upper 1% quantile of each statistic. The table includes all demographic scenarios (large and small population, with and without migration), genomic backgrounds (genetic drift vs. BGS), and selection regimes (hard complete, partial, soft complete) considered in the study. Migration scenarios involved hard sweeps that might have fixed if not for gene flow.
Figure 4.
Figure 4.. PBE and PBSn1 are less likely than other statistics to register parallel sweep loci among their top outliers.
For model genome scenarios in which 1% of loci are subject to parallel sweeps in all populations, the heat maps show the fraction of parallel sweep loci that contribute to the upper 1% quantile of FST and branch statistic distributions (i.e. the fraction of parallel sweep loci that are false positives) under different demographic and selection parameters.

References

    1. Adrion JR, Hahn MW, Cooper BS. 2015. Revisiting classic clines in Drosophila melanogaster in the age of genomics. Trends in Genetics 31:P434–444. - PMC - PubMed
    1. Akey JM. 2009. Constructing genomic maps of positive selection in humans: where do we go from here? Genome Research 19:711–722. - PMC - PubMed
    1. Amato R, et al. 2009. Genome-wide scan for signatures of human population differentiation and their relationship with natural selection, functional pathways, and diseases. PLoS One:e7927. - PMC - PubMed
    1. Antonovics J, Bradshaw AD. 1970. Evolution in closely adjacent plant populations. VIII. Clincal patterns in Anthoxanthum odoratum across a mine boundary. Heredity 25:249–362.
    1. Baumdicker F, et al. 2022. Efficient ancestry and mutation simulation with msprime 1.0. Genetics 220:iyab229. - PMC - PubMed

Publication types

LinkOut - more resources