Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2008 Jul;18(7):1020-9.
doi: 10.1101/gr.074187.107. Epub 2008 Apr 14.

Population genetic analysis of shotgun assemblies of genomic sequences from multiple individuals

Affiliations
Comparative Study

Population genetic analysis of shotgun assemblies of genomic sequences from multiple individuals

Ines Hellmann et al. Genome Res. 2008 Jul.

Abstract

We introduce a simple, broadly applicable method for obtaining estimates of nucleotide diversity from genomic shotgun sequencing data. The method takes into account the special nature of these data: random sampling of genomic segments from one or more individuals and a relatively high error rate for individual reads. Applying this method to data from the Celera human genome sequencing and SNP discovery project, we obtain estimates of nucleotide diversity in windows spanning the human genome and show that the diversity to divergence ratio is reduced in regions of low recombination. Furthermore, we show that the elevated diversity in telomeric regions is mainly due to elevated mutation rates and not due to decreased levels of background selection. However, we find indications that telomeres as well as centromeres experience greater impact from natural selection than intrachromosomal regions. Finally, we identify a number of genomic regions with increased or reduced diversity compared with the local level of human-chimpanzee divergence and the local recombination rate.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Schematic drawing of shotgun reads for one window. The colored bars represent the reads; each color corresponds to a different individual. For our analysis, the window is subdivided into v different segments, so that the sampling depth of reads is invariable within a segment. For example, in segment ri, we have sampled five reads, corresponding to three individuals, and two individuals have been sampled twice. Therefore, the minimal and maximal number of chromosomes sampled is nmin = 3 and nmax = 5, respectively.
Figure 2.
Figure 2.
Relationship between recombination rate and diversity. Non-overlapping windows were ordered according to recombination rate and sorted into bins of 100 windows. (A) The average formula image/d for the bins was plotted against the average recombination rate as log(ρ), where ρ is the number of recombinants per base per meiosis (red line). For these binned data, we also estimated the parameters for a simple hitchhiking model (HH, cyan line) and a simple background selection model (BS, purple line). formula image/d vs. log(ρ) is also drawn for 100 simulated data sets under the BS-model (black lines). (B) For 100 data sets simulated under the BS-model, we estimated the parameters for the HH- and BS-models. Given these new estimates, we counted how often the BS-model fits better than the HH-model by using the sum of squares and bootstrapping over the bins. In most cases, the BS-model fit consistently better (bootstrap-value closer to 1); but for the real data, the HH-model gave a slightly better fit (cyan arrow).
Figure 3.
Figure 3.
The area around the second lowest value of formula imageE on chromosome 3. Vertical bars represent the values for a 100-kb window positioned at their midpoint. (A) Plotted is formula imageE, whereas the expected value is either under a neutral or a background selection model, whatever was more conservative. The colors of the bars indicate whether formula imageE lie outside of a 95%, 96%, etc., confidence interval obtained through simulations. (B) Pink triangles mark recombination hotspots; blue lines correspond to genes from the RefSeq gene track of the UCSC genome browser. The closest gene is EPHA6. (CE) The values for formula image, human–chimpanzee divergence, and the recombination rate in cM/Mb are plotted.
Figure 4.
Figure 4.
Centromeres (cen) and Telomeres (tel) behave differently from intrachromosomal (ic) regions in their recombination rates (A), diversity formula image (B), human–chimpanzee divergence (C), and hence, also in the predicted value θE (D). However, formula image for centromeres and telomeres is smaller than θE.

References

    1. Adams A.M., Hudson R.R. Maximum-likelihood estimation of demographic parameters using the frequency spectrum of unlinked single-nucleotide polymorphisms. Genetics. 2004;168:1699–1712. - PMC - PubMed
    1. Akey J.M., Eberle M.A., Rieder M.J., Carlson C.S., Shriver M.D., Nickerson D.A., Kruglyak L. Population history and natural selection shape patterns of genetic variation in 132 genes. PLoS Biol. 2004;2:e286. doi: 10.1371/journal.pbio.0020286. - DOI - PMC - PubMed
    1. Aloni R., Olender T., Lancet D. Ancient genomic architecture for mammalian olfactory receptor clusters. Genome Biol. 2006;7:R88. doi: 10.1186/gb-2006-7-10-r88. - DOI - PMC - PubMed
    1. Altshuler D., Pollara V.J., Cowles C.R., Van Etten W.J., Baldwin J., Linton L., Lander E.S. An SNP map of the human genome generated by reduced representation shotgun sequencing. Nature. 2000;407:513–516. - PubMed
    1. Andolfatto P. Adaptive evolution of non-coding DNA in Drosophila. Nature. 2005;437:1149–1152. - PubMed

Publication types

LinkOut - more resources