Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2009 Jun 15;25(12):i222-30.
doi: 10.1093/bioinformatics/btp208.

A geometric approach for classification and comparison of structural variants

Affiliations
Comparative Study

A geometric approach for classification and comparison of structural variants

Suzanne Sindi et al. Bioinformatics. .

Abstract

Motivation: Structural variants, including duplications, insertions, deletions and inversions of large blocks of DNA sequence, are an important contributor to human genome variation. Measuring structural variants in a genome sequence is typically more challenging than measuring single nucleotide changes. Current approaches for structural variant identification, including paired-end DNA sequencing/mapping and array comparative genomic hybridization (aCGH), do not identify the boundaries of variants precisely. Consequently, most reported human structural variants are poorly defined and not readily compared across different studies and measurement techniques.

Results: We introduce Geometric Analysis of Structural Variants (GASV), a geometric approach for identification, classification and comparison of structural variants. This approach represents the uncertainty in measurement of a structural variant as a polygon in the plane, and identifies measurements supporting the same variant by computing intersections of polygons. We derive a computational geometry algorithm to efficiently identify all such intersections. We apply GASV to sequencing data from nine individual human genomes and several cancer genomes. We obtain better localization of the boundaries of structural variants, distinguish genetic from putative somatic structural variants in cancer genomes, and integrate aCGH and paired-end sequencing measurements of structural variants. This work presents the first general framework for comparing structural variants across multiple samples and measurement techniques, and will be useful for studies of both genetic structural variants and somatic rearrangements in cancer.

Availability: http://cs.brown.edu/people/braphael/software.html .

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Derivation of regions of uncertainty (breakpoint regions) from ESP and aCGH data. (A) (Top panel) In ESP, or paired-end mapping, both ends of a fragment of a test genome are sequenced and aligned to the reference genome. Here, alignment of ends of fragments C and D yields ES pairs (xC, yC) and (xD, yD) on the reference genome that suggest an inversion. (Bottom panel) The intersection of breakpoint regions defined by Equation (1) indicates the possible locations of inversion breakpoints a and b that are consistent with the ES pairs. (B) (Top panel) In an aCGH experiment, the reference genome is segmented into regions of equal copy number according to measurements at genomic probes (boxes). A deletion with breakpoints a and b is identified as a change in copy number between probes pi and pi+1 and between probes pj and pj+1. (Bottom panel) The intervals [pi, pi+1] and [pj, pj+1] define a rectangular breakpoint region. This region is intersected with the breakpoint region defined by an ES pair (xC, yC) to refine the locations a and b of the deletion.
Fig. 2.
Fig. 2.
Breakpoint regions determined by fragments from Kidd et al. (2008) whose orientations suggest an inversion variant(s). Breakpoint region 2 has distinct intersections with regions 1 and 3, and thus iterative merging of breakpoint regions will not identify all intersections.
Fig. 3.
Fig. 3.
Examples of the three events of the plane sweep: (A) addition, (B) intersection and (C) removal. In each case black dots label the points recorded in the cyclic lists a and b (indicated as dashed paths) that form ℛ. In addition, we show in {}'s the labels assigned to the intersecting breakpoint regions.
Fig. 4.
Fig. 4.
Geometric analysis of inversion polymorphisms from Kidd et al. (2008) reveals disparities between the reported boundary of variants (black dots) and the intersections of breakpoint regions. (A) An inversion on chr1 with 79 reported supporting clones from all nine individuals has no point in common to all breakpoint regions. The number x next to each of the three regions indicates a clone from individual labeled ABCx in Kidd et al. (2008) is present in the cluster; a ‘G’ indicates the G95 individual from Tuzun et al. (2005). The bottom right region contains clones from all nine individuals, while individual ABC13 has clones from all three regions suggesting multiple distinct structural variants or mapping difficulties at this locus. (B) An inversion from chr3 with 22 supporting clones from all eight HapMap individuals. We examined one fully sequenced clone (dashed trapezoid) from individual ABC7 and found two possible inversion breakpoints (black squares). Both of these lie in the intersection of all breakpoint regions but are ∼37 kb from the reported boundary.
Fig. 5.
Fig. 5.
Intersection of 33 inversion breakpoint regions (blue) and 4 deletion breakpoint regions (red), indicates common genomic location of two structural variants.
Fig. 6.
Fig. 6.
Intersection between six breakpoint regions from ESP data (blue trapezoids) and two breakpoint regions determined by aCGH (red rectangle) on chr17 in the BT474 breast cancer cell line. In this case, the spacing between aCGH probes provides a more precise localization of the breakpoint region that the paired-end sequencing data.

References

    1. Aerni S, et al. Combined analysis of copy number changes and structural rearrangements in cancer genomes. 2009
    1. Bashir A, et al. Evaluation of paired-end sequencing strategies for detection of genome rearrangements in cancer. PLoS Comput. Biol. 2008;4:e1000051. - PMC - PubMed
    1. Campbell P, et al. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat. Genet. 2008;40:722–729. - PMC - PubMed
    1. Chazelle B, Edelsbrunner H. An optimal algorithm for intersecting line segments in the plane. J. ACM. 1992;39:1–54.
    1. Conrad D, et al. A high-resolution survey of deletion polymorphism in the human genome. Nat. Genet. 2006;38:75–81. - PubMed

Publication types