Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Aug;16(8):980-9.
doi: 10.1101/gr.5157306. Epub 2006 Jul 6.

Genomic signatures of positive selection in humans and the limits of outlier approaches

Affiliations

Genomic signatures of positive selection in humans and the limits of outlier approaches

Joanna L Kelley et al. Genome Res. 2006 Aug.

Abstract

Identifying regions of the human genome that have been targets of positive selection will provide important insights into recent human evolutionary history and may facilitate the search for complex disease genes. However, the confounding effects of population demographic history and selection on patterns of genetic variation complicate inferences of selection when a small number of loci are studied. To this end, identifying outlier loci from empirical genome-wide distributions of genetic variation is a promising strategy to detect targets of selection. Here, we evaluate the power and efficiency of a simple outlier approach and describe a genome-wide scan for positive selection using a dense catalog of 1.58 million SNPs that were genotyped in three human populations. In total, we analyzed 14,589 genes, 385 of which possess patterns of genetic variation consistent with the hypothesis of positive selection. Furthermore, several extended genomic regions were found, spanning >500 kb, that contained multiple contiguous candidate selection genes. More generally, these data provide important practical insights into the limits of outlier approaches in genome-wide scans for selection, provide strong candidate selection genes to study in greater detail, and may have important implications for disease related research.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Performance of a simple outlier approach in ascertained data. Patterns of polymorphism were simulated for 1000 unlinked loci consisting of varying fractions of neutral and positively selected loci (indicated at the top of each column). For each locus, 72 chromosomes were simulated and divided into discovery (24) and sample panels (48). SNP discovery was then performed by randomly selecting ND chromosomes from the discovery panel, which were then “genotyped” in the sample set, and the resulting genotypes used to calculate TDGen. The x-axis denotes the values of ND considered. Note that ND = 48 corresponds to complete ascertainment (i.e., complete sequence data). The y-axis denotes the positive predictive value (PPV) when using a threshold of either the first (A) or fifth (B) percentiles of the empirical distribution of TDGen. Horizontal dashed lines equal the expected PPV based on randomly sampling either 10 (1% threshold) or 50 (5% threshold) loci. Vertical bars indicate 95% confidence intervals. Black and red lines denote simulation results in which the scaled population selection coefficient, σ = 2Nes, for positively selected loci was 200 or 20, respectively.
Figure 2.
Figure 2.
The correlation between TDSeq and TDGen predicts the performance of a simple outlier approach in ascertained data sets. The correlation, r, between Tajima’s D derived from complete sequence (TDSeq) and genotype (TDGen) data was calculated from the data sets described in Figure 1. ND denotes the number of chromosomes used for SNP discovery. Discovered SNPs were then genotyped in the sample panel and used to calculate TDGen (see Fig. 1 legend). For each value of ND, there are eight points, which correspond to all combinations of simulation parameters: σ (20 and 200), fraction of positively selected loci (1% and 10%), and threshold used in defining candidate selection genes (1% and 5%). Note that for each value of ND the correlation between TDSeq and TDGen for the eight different parameter combinations differed by <1%, and thus for presentation purposes, the average correlation is shown. The gray shaded area helps to demark the range of (PPVG/PPVS) values for each value of ND and simulation parameters.
Figure 3.
Figure 3.
Comparing the observed distribution of TDGen to neutral expectations. The observed distribution of TDGen in genic regions compared to that observed for nongenic regions (A) and coalescent simulations (B) incorporating demographic perturbations and ascertainment bias. In modeling ascertainment bias, we considered the full range of discovery chromosomes from ND = 2 to NT, where NT = 48, 48, and 46 for the EA, CHN, and AA samples, respectively. Shown here are the simulated distributions that most closely match the observed distributions in each sample (ND = 11, 8, and 5 for the EA, CHN, and AA samples, respectively). The full details of the simulations are described in the Methods.
Figure 4.
Figure 4.
Strong signatures of positive selection that extend over large genomic regions. Patterns of polymorphism from the two largest clusters of candidate selection genes are shown in A and B. The regions shown in A and B are located on chromosomes 2 and 10, respectively. The signature of selection extends for >500 kb in both regions. In each panel, a graphical representation of genotypes is shown for the AA, CHN, and EA samples. Rows correspond to individuals and columns denote SNPs. For each SNP, blue, yellow, and red boxes indicate whether the individual is homozygous for the common allele, heterozygous, or homozygous for the rare allele, respectively. White boxes indicate missing data. Horizontal black bars denote the location of each gene. The distribution of TDGen for each gene is shown above the graphical representation of genotypes and the chromosomal position in Mb is shown on the x-axis (not drawn to scale). Blue, green, and red circles denote AA, CHN, and EA samples, respectively. Genes located immediately upstream and downstream of the region where patterns of polymorphism begin to approach neutrality are also shown, which helps to demark the signature of selection. Note that in A, the gene MGC10701 (Entrez gene symbol) (located between GCC2 and LIMS1) does not have genotype data available in all three samples and is not included in the figure. Similarly, in B the genes MRPS16 (located between DNAJC9 and TTC18) and ZMYND17 (located between ANXA7 and PPP3CB) are not shown, as they do not contain genotype data in all three samples. For both regions, an excess of low frequency alleles is also observed between genes (data not shown).

References

    1. Akey J.M., Zhang G., Zhang K., Jin L., Shriver M.D., Zhang G., Zhang K., Jin L., Shriver M.D., Zhang K., Jin L., Shriver M.D., Jin L., Shriver M.D., Shriver M.D. Interrogating a high-density SNP map for signatures of natural selection. Genome Res. 2002;12:1805–1814. - PMC - PubMed
    1. Akey J.M., Zhang K., Xiong M., Jin L., Zhang K., Xiong M., Jin L., Xiong M., Jin L., Jin L. The effect of single nucleotide polymorphism identification strategies on estimates of linkage disequilibrium. Mol. Biol. Evol. 2003;20:232–242. - PubMed
    1. Akey J.M., Eberle M.A., Rieder M.J., Carlson C.S., Shriver M.D., Nickerson D.A., Kruglyak L., Eberle M.A., Rieder M.J., Carlson C.S., Shriver M.D., Nickerson D.A., Kruglyak L., Rieder M.J., Carlson C.S., Shriver M.D., Nickerson D.A., Kruglyak L., Carlson C.S., Shriver M.D., Nickerson D.A., Kruglyak L., Shriver M.D., Nickerson D.A., Kruglyak L., Nickerson D.A., Kruglyak L., Kruglyak L. Population history and natural selection shape patterns of genetic variation in 132 genes. PLoS Biol. 2004;2:e286. - PMC - PubMed
    1. Andolfatto P. Adaptive hitchhiking effects on genome variability. Curr. Opin. Genet. Dev. 2001;11:635–641. - PubMed
    1. Bamshad M., Wooding S.P., Wooding S.P. Signatures of natural selection in the human genome. Nat. Rev. Genet. 2003;4:99–111. - PubMed

Publication types

LinkOut - more resources