Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Dec;18(12):1865-74.
doi: 10.1101/gr.081422.108. Epub 2008 Oct 8.

Analysis of copy number variants and segmental duplications in the human genome: Evidence for a change in the process of formation in recent evolutionary history

Affiliations

Analysis of copy number variants and segmental duplications in the human genome: Evidence for a change in the process of formation in recent evolutionary history

Philip M Kim et al. Genome Res. 2008 Dec.

Abstract

Segmental duplications (SDs) are operationally defined as >1 kb stretches of duplicated DNA with high sequence identity. They arise from copy number variants (CNVs) fixed in the population. To investigate the formation of SDs and CNVs, we examine their large-scale patterns of co-occurrence with different repeats. Alu elements, a major class of genomic repeats, had previously been identified as prime drivers of SD formation. We also observe this association; however, we find that it sharply decreases for younger SDs. Continuing this trend, we find only weak associations of CNVs with Alus. Similarly, we find an association of SDs with processed pseudogenes, which is decreasing for younger SDs and absent entirely for CNVs. Next, we find that SDs are significantly co-localized with each other, resulting in a highly skewed "power-law" distribution and chromosomal hotspots. We also observe a significant association of CNVs with SDs, but find that an SD-mediated mechanism only accounts for some CNVs (<28%). Overall, our results imply that a shift in predominant formation mechanism occurred in recent history: approximately 40 million years ago, during the "Alu burst" in retrotransposition activity, non-allelic homologous recombination, first mediated by Alus and then the by newly formed CNVs themselves, was the main driver of genome rearrangements; however, its relative importance has decreased markedly since then, with proportionally more events now stemming from other repeats and from non-homologous end-joining. In addition to a coarse-grained analysis, we performed targeted sequencing of 67 CNVs and then analyzed a combined set of 270 CNVs (540 breakpoints) to verify our conclusions.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Schematic representation of the overall analysis methodology. For the coarse-grained analysis, genomic features are surveyed. First, the number of features in each genomic bin is counted. Then the overall pairwise correlation is measured (using Spearman rank correlation or Wilcoxon rank-sum tests).
Figure 2.
Figure 2.
Segmental duplications are distributed according to a power law in the human genome. As can be seen, segmental duplications follow a power-law distribution, that is, while most regions in the genome are relatively poor in SDs, there are a small number of regions with much higher SD occurrence [p(x) ∼ x−0.31]. This is indicative of a preferential attachment (“rich get richer”) mechanism.
Figure 3.
Figure 3.
Heatmap of associations of SDs in different sequence identity bins. SDs co-occur best with pre-existing SDs of similar age, and this trend appears to be stronger for older SDs. Associations are given as Spearman rank correlations of the number of occurrences in genomic bins. All correlations are highly significant (P-value ≪ 0.00001).
Figure 4.
Figure 4.
(A) Alu-mediated NAHR and preferential attachment are two complementary mechanisms for SD formation. In Alu-rich regions (>10 Alu elements per 10 kb), the association of SDs and pre-existing SDs is much lower than in Alu-poor regions (no Alu elements per 100 kb). Associations are given as Spearman rank correlations of the number of occurrences in genomic bins. All correlations are highly significant (P-value ≪ 0.00001). (B) Association of Alu elements and SDs is highest for the oldest (∼40 Mya) SDs and drops significantly for recent SDs. At the same time, preference for subtelomeric regions and a presumed NHEJ mechanism rises. Associations are given as Spearman rank correlations of the number of occurrences in genomic bins. All correlations are highly significant (P-value ≪ 0.00001).
Figure 5.
Figure 5.
Sequence divergence of repeat elements in the human genome. As approximate age, the sequence divergence shows a burst of Alu activity ∼40 Mya and a marked decrease afterward. The distribution of (active) LINE elements is somewhat more even. The relative number of SDs decreases in a fashion similar to the Alu elements.
Figure 6.
Figure 6.
(A) Pseudogene association with SDs. Just like Alu elements, pseudogenes colocalize very strongly with old SDs and less so with younger SDs. All correlations are highly significant (P-value ≪ 0.00001). (B) Detailed SD junction analysis. A total of 144 SDs showed matching processed pseudogenes at both junctions, that is, both pseudogenes have the same parent gene and show high homology. When picking random genomic regions of the same size and number as SDs, no matching pseudogenes were ever found to overlap both SD junctions. When using a randomized offset of ±5 kb to account for potential sequence biases, an average of 40 matching pseudogenes were found, but in 1000 trials, never more than 43. (C) Schematic of matching processed pseudogenes at SD junctions. The processed pseudogenes overlap matching SD junctions at both duplicated segments, making them likely candidates for having mediated NAHR.
Figure 7.
Figure 7.
(A) Association of SDs and CNVs. Shown is the association of SDs (90%–99% sequence identity) with (left bar) “young” SDs (>99% sequence identity) and (right bar) CNVs. CNVs colocalize with SDs, but much more weakly than with very young SDs. Associations are given as Spearman rank correlations of the number of occurrences in genomic bins. All correlations are highly significant (P-value ≪ 0.00001). (B) CNV association with different human repeat elements. CNVs associate weakly with L1 elements and microsatellites, but show no association with Alu elements. (C) CNV association with human repeat elements after correcting for SD content. There is almost no significant association; the observed depletion in Alu elements may be due to a preference of CNVs for subtelomeric regions. Associations are given as Spearman rank correlations of the number of occurrences in genomic bins. P-values of the correlations are given in the bubbles.
Figure 8.
Figure 8.
A schematic of the change of formation mechanism over the last 40 million years in the mammalian lineage.

References

    1. Albert R., Barabasi A.L. Statistical mechanics of complex networks. Rev. Mod. Phys. 2002;74:47–97.
    1. Bailey J.A., Eichler E.E. Primate segmental duplications: Crucibles of evolution, diversity and disease. Nat. Rev. Genet. 2006;7:552–564. - PubMed
    1. Bailey J.A., Gu Z., Clark R.A., Reinert K., Samonte R.V., Schwartz S., Adams M.D., Myers E.W., Li P.W., Eichler E.E. Recent segmental duplications in the human genome. Science. 2002;297:1003–1007. - PubMed
    1. Bailey J.A., Liu G., Eichler E.E. An Alu transposition model for the origin and expansion of human segmental duplications. Am. J. Hum. Genet. 2003;73:823–834. - PMC - PubMed
    1. Barabasi A.L., Albert R. Emergence of scaling in random networks. Science. 1999;286:509–512. - PubMed

Publication types

LinkOut - more resources