Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2008 May 1;453(7191):56-64.
doi: 10.1038/nature06862.

Mapping and sequencing of structural variation from eight human genomes

Affiliations
Comparative Study

Mapping and sequencing of structural variation from eight human genomes

Jeffrey M Kidd et al. Nature. .

Abstract

Genetic variation among individual humans occurs on many different scales, ranging from gross alterations in the human karyotype to single nucleotide changes. Here we explore variation on an intermediate scale--particularly insertions, deletions and inversions affecting from a few thousand to a few million base pairs. We employed a clone-based method to interrogate this intermediate structural variation in eight individuals of diverse geographic ancestry. Our analysis provides a comprehensive overview of the normal pattern of structural variation present in these genomes, refining the location of 1,695 structural variants. We find that 50% were seen in more than one individual and that nearly half lay outside regions of the genome previously described as structurally variant. We discover 525 new insertion sequences that are not present in the human reference genome and show that many of these are variable in copy number between individuals. Complete sequencing of 261 structural variants reveals considerable locus complexity and provides insights into the different mutational processes that have shaped the human genome. These data provide the first high-resolution sequence map of human structural variation--a standard for genotyping platforms and a prelude to future individual genome sequencing projects.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Map of structural variation in the human genome
The location of 724 insertions (blue), 747 deletions (red) and 224 inversions (green) that have been experimentally validated are mapped onto the human genome (build35). Sites are arranged according to individuals in rows above each chromosome, in order of the nine individual genomic libraries (G248 (first row), then ABC7–ABC14); the Coriell IDs are listed in Table 1. All sites have been validated by array CGH, MCD analysis, or sequencing in at least one reference individual. The location of 525 novel sequence loci are depicted as arrows below each chromosome. Those mapping to gaps (black) are distinguished from those mapping to regions not associated with gaps (orange). The Y chromosome is not shown because samples were primarily from females.
Figure 2
Figure 2. Frequency distribution
Plot showing the number of times that a particular structural variant was detected on the basis of ESP analysis for nine fosmid libraries (eight HapMap, plus G248): 15% (261 of 1,695) of the sites seem to represent a more common sequence configuration (major allele) with respect to the human reference genome; 49% (839 of 1,695) of the validated sites are observed once, suggesting that saturation has not been achieved. The numbers above the columns report the total number of events for each frequency class.
Figure 3
Figure 3. Discovery of novel human sequences that are CNV
a, Clusters of clones where one end is mapped to the genome (build35) but the other does not map are shown schematically on the basis of their orientation (blue and yellow lines). Three categories are distinguished: clones mapping around a site already spanned by a discordant fosmid ESP (spanned), regions where no discordant clones are identified (unspanned), and clones mapping adjacent to sequence gaps (gap). b, Array CGH experiment based on an oligonucleotide microarray designed to a sequence assembly of these novel sequences (525 distinct loci). Of the spanned and unspanned loci, 45% show copy-number variation (gains, orange; losses, blue) in comparison with a reference sample (NA15510). Each data point represents the average log2 intensity values for all of the probes from a single contig. Within each of the three categories, contigs are ordered on the basis of their chromosomal anchored positions. The bottom row represents the results of one of three self-versus-self hybridizations with sample NA15510. c, A novel insertion of 130 kbp on chromosome 6 identified by OEA fosmid clones (blue and gold arrows) and confirmed by optical mapping of DNA from the GM15510 cell line. Optical images of SwaI-restricted DNA are aligned to the reference (build35) genome. This large insertion maps intergenically to a region rich in conserved sequence elements and is confirmed in all eight libraries. This region does not correspond to a known gap in the human genome and does not appear CNV in our eight samples. d, Validation of a CNV region by fluorescence in situ hybridization. Hemizygous signals are detected by fluorescence in situ hybridization on metaphase chromosomes (with OEA clone ABC7_42397600_G7 as probe), corresponding to samples where no signal intensity difference was observed with respect to the reference by array CGH.
Figure 4
Figure 4. Sequence resolution of human structural variation
Two different deletions within the SIRPB1 gene (exons, red) provide evidence for an independently recurrent deletion event. Both structural variants are probably mediated by non-allelic homologous recombination between segmental duplications (blue bars, arching lines) in direct orientation. Deletion alleles from four different individuals (G248 and ABC8−10) are depicted; deletion 2 (del2, minimal region chromosome 20: 1509210−1542041) eliminates exon 2, whereas deletion 1 (del1, minimal region chromosome 20: 1502353−1533914) does not. Repeat content and orientation are depicted as coloured arrows (green, long interspersed transposable element; purple, short interspersed transposable element; orange, transposon). Predicted and annotated segmental duplications are depicted as indicated.
Figure 5
Figure 5. Regions of enriched SNP density
Regions of increased single nucleotide variant density identified in eight individuals are shown (ABC7–ABC14 samples ordered from bottom to top). Heterozygosity was calculated in 100-kbp windows, and those windows having a heterozygosity 2 s.d. above the mean are plotted for four chromosomes. Regions of increased heterozygosity, over 1 Mb of genome sequence, are highlighted by red bars.

Similar articles

Cited by

References

    1. Iafrate AJ, et al. Detection of large-scale variation in the human genome. Nature Genet. 2004;36:949–951. - PubMed
    1. Sebat J, et al. Large-scale copy number polymorphism in the human genome. Science. 2004;305:525–528. - PubMed
    1. Tuzun E, et al. Fine-scale structural variation of the human genome. Nature Genet. 2005;37:727–732. - PubMed
    1. Sharp AJ, et al. Segmental duplications and copy-number variation in the human genome. Am. J. Hum. Genet. 2005;77:78–88. - PMC - PubMed
    1. Redon R, et al. Global variation in copy number in the human genome. Nature. 2006;444:444–454. - PMC - PubMed

Publication types