Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Sep 3;110(36):14699-704.
doi: 10.1073/pnas.1221792110. Epub 2013 Aug 19.

Segmenting the human genome based on states of neutral genetic divergence

Affiliations

Segmenting the human genome based on states of neutral genetic divergence

Prabhani Kuruppumullage Don et al. Proc Natl Acad Sci U S A. .

Abstract

Many studies have demonstrated that divergence levels generated by different mutation types vary and covary across the human genome. To improve our still-incomplete understanding of the mechanistic basis of this phenomenon, we analyze several mutation types simultaneously, anchoring their variation to specific regions of the genome. Using hidden Markov models on insertion, deletion, nucleotide substitution, and microsatellite divergence estimates inferred from human-orangutan alignments of neutrally evolving genomic sequences, we segment the human genome into regions corresponding to different divergence states--each uniquely characterized by specific combinations of divergence levels. We then parsed the mutagenic contributions of various biochemical processes associating divergence states with a broad range of genomic landscape features. We find that high divergence states inhabit guanine- and cytosine (GC)-rich, highly recombining subtelomeric regions; low divergence states cover inner parts of autosomes; chromosome X forms its own state with lowest divergence; and a state of elevated microsatellite mutability is interspersed across the genome. These general trends are mirrored in human diversity data from the 1000 Genomes Project, and departures from them highlight the evolutionary history of primate chromosomes. We also find that genes and noncoding functional marks [annotations from the Encyclopedia of DNA Elements (ENCODE)] are concentrated in high divergence states. Our results provide a powerful tool for biomedical data analysis: segmentations can be used to screen personal genome variants--including those associated with cancer and other diseases--and to improve computational predictions of noncoding functional elements.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Schematic overview of the data generation and filtering steps for AR divergence estimation in 1-Mb windows. Dashed arrows point to the number of windows discarded at each filtering step.
Fig. 2.
Fig. 2.
Heatmaps of (A) divergence profiles and (B) genomic landscapes for the states of the six-state AR model. (A) Differences between state and genome-wide medians for each of the four divergence rates (blue, depression; yellow, enhancement) on a scale from −1.5 to +1.5. Differences outside this range are encoded as −1.5 and +1.5. (B) Significance [−loge(P value)] computed via simulations and sign (blue, depression; yellow, enhancement) for each genomic landscape feature within each state relative to genome-wide medians, on a scale from −10 to +10. Significances outside this range are encoded as −10 and +10. Gray cells correspond to nonsignificant associations (P values ≤0.1). Significance values are reported in SI Appendix, Table S6.
Fig. 3.
Fig. 3.
Genomic locations of segments belonging to the divergence states of the six-state AR model. Bars represent human chromosomes, reported in scale and with positions indicated on the vertical axis. Gray regions correspond to windows excluded from the analysis due to assembly gaps or data preprocessing filters.
Fig. 4.
Fig. 4.
Genes and functional marks in the divergence states of the six-state AR model. Features are measured as counts or base coverage (the latter denoted by an asterisk). Only statistically significant P values (≤0.1) are shown with yellow and blue denoting enrichment and depletion, respectively; NS, nonsignificant. SI Appendix, Table S5 A and B contain P values and observed and expected counts for a broader set of features. The cold X-chromosomal state is excluded from this analysis due to limited functional annotation on X.

Similar articles

Cited by

References

    1. Hodgkinson A, Eyre-Walker A. Variation in the mutation rate across mammalian genomes. Nat Rev Genet. 2011;12(11):756–766. - PubMed
    1. 1000 Genomes Project Consortium A map of human genome variation from population-scale sequencing. Nature. 2010;467(7319):1061–1073. - PMC - PubMed
    1. Hodgkinson A, Chen Y, Eyre-Walker A. The large-scale distribution of somatic mutations in cancer genomes. Hum Mutat. 2012;33(1):136–143. - PubMed
    1. Schuster-Böckler B, Lehner B. Chromatin organization is a major influence on regional mutation rates in human cancer cells. Nature. 2012;488(7412):504–507. - PubMed
    1. Ananda G, Chiaromonte F, Makova KD. A genome-wide view of mutation rate co-variation using multivariate analyses. Genome Biol. 2011;12(3):R27. - PMC - PubMed

Publication types

LinkOut - more resources