Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jan 2;40(1):btae006.
doi: 10.1093/bioinformatics/btae006.

selscan 2.0: scanning for sweeps in unphased data

Affiliations

selscan 2.0: scanning for sweeps in unphased data

Zachary A Szpiech. Bioinformatics. .

Abstract

Summary: Several popular haplotype-based statistics for identifying recent or ongoing positive selection in genomes require knowledge of haplotype phase. Here, we provide an update to selscan which implements a re-definition of these statistics for use in unphased data.

Availability and implementation: Source code and binaries are freely available at https://github.com/szpiech/selscan, implemented in C/C++, and supported on Linux, Windows, and MacOS.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Figure 1.
Figure 1.
Unphased power. Power curves for unphased implementations of iHS (A), nSL (B), XP-EHH (C), and XP-nSL (D), and power difference between unphased implementations of iHS (E), nSL (F), XP-EHH (G), and XP-nSL (H) and phased implementations. Blue curves represent the power difference between the unphased and phased statistics when applied to unphased data (UN). Red curves represent the power difference between the unphased and phased statistics when applied to perfectly phased data (PH). Values greater than 0 indicate the unphased statistic had higher power. All panels represent analyses with demographic history Demo 1 and n = 100, 50, 20, or 10 diploid samples. For these plots the selection coefficient is set at s=0.01, the frequency at which selection began is set at e=0 (i.e. a hard sweep), and the divergence time in generations is set at td=2000. f is the frequency of the adaptive allele at time of sampling, g is the number of generations at time of sampling since fixation.

References

    1. Browning BL, Tian X, Zhou Y. et al. Fast two-stage phasing of large-scale sequence data. Am J Hum Genet 2021;108:1880–90. - PMC - PubMed
    1. Campagna L, Toews DPL.. The genomics of adaptation in birds. Curr Biol 2022;32:R1173–86. - PubMed
    1. Colonna V, Ayub Q, Chen Y. et al.; 1000 Genomes Project Consortium. Human genomic regions with exceptionally high levels of population differentiation identified from 911 whole-genome sequences. Genome Biol 2014;15:R88. - PMC - PubMed
    1. Crawford NG, Kelly DE, Hansen MTEB. et al.; NISC Comparative Sequencing Program. Loci associated with skin pigmentation identified in African populations. Science 2017;358:eaan8433. - PMC - PubMed
    1. DeGiorgio M, Szpiech ZA.. A spatially aware likelihood test to detect sweeps from haplotype distributions. PLoS Genet 2022;18:e1010134. - PMC - PubMed

Publication types