Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2020 Mar;21(3):171-189.
doi: 10.1038/s41576-019-0180-9. Epub 2019 Nov 15.

Structural variation in the sequencing era

Affiliations
Review

Structural variation in the sequencing era

Steve S Ho et al. Nat Rev Genet. 2020 Mar.

Abstract

Identifying structural variation (SV) is essential for genome interpretation but has been historically difficult due to limitations inherent to available genome technologies. Detection methods that use ensemble algorithms and emerging sequencing technologies have enabled the discovery of thousands of SVs, uncovering information about their ubiquity, relationship to disease and possible effects on biological mechanisms. Given the variability in SV type and size, along with unique detection biases of emerging genomic platforms, multiplatform discovery is necessary to resolve the full spectrum of variation. Here, we review modern approaches for investigating SVs and proffer that, moving forwards, studies integrating biological information with detection will be necessary to comprehensively understand the impact of SV in the human genome.

PubMed Disclaimer

Conflict of interest statement

Competing interests

The authors declare no competing interests.

Figures

Figure 1 |
Figure 1 |. Overview of ensemble algorithms.
This flowchart outlines the major steps in an ensemble algorithm. Step 1, discordantly mapped reads result in signatures that are used to infer SVs. Step 2, multiple independent algorithms detect SVs in parallel. Step 3, filters and heuristics based on the project aims are applied to remove false-positives and merge calls (see BOX 2 for details). Step 4, final decisions are made to designate and preserve high-confidence calls and they are output as a consolidated list of putative variants.
Figure 2 |
Figure 2 |. Structural variation signatures in single-molecule and connected-molecule strategies.
Emerging technologies vary in how they detect SVs. 10x Genomics linked-reads detect SVs based on barcode overlap between genomic loci. Split-molecule approaches infer SVs from splitting of linked-reads, examples of which are displayed below each barcode matrix (each color represents a shared barcode and linked-molecules are separated by haplotype; only homozygous variants are shown for simplicity). Strand-seq determines SVs based on read-depth or sudden changes in mapping orientation. For deletions and duplications, only two of four possible daughter cell configurations are shown for simplicity (Watson-Watson and Watson-Crick, Crick-Crick not shown). For inversions, only a homozygous inversion in Watson-Watson and Crick-Crick daughter cells are shown as Watson-Crick daughter cells mask homozygous inversions (homozygous for simplicity; for more detail on inversion detection see REF. Hi-C detects SVs by looking for unusually high-frequency contacts between genomic loci. Underneath each interaction matrix is a schematic of the expected chromosomal contacts resulting from each SV. Single-molecule sequencing methods infer SVs based on discordant mapping signatures that can involve one (intra) or many (inter) reads. SVs derive from intra-read signatures, which result from reads that span an entire SV, or inter-read signatures, which require multiple reads to cover the event. Insertions differ from deletions by an increase in the expected distance between the two split pairs marked by the white soft-clip between the reads and inversions involve reads that map best to the complimentary strand. Optical maps detect SVs based on increased presence, absence or change in the orientation of restriction enzyme sites compared to a reference (blue: sample; green: reference). Resolution is dependent on the distribution of restriction enzyme sites.
Figure 3 |
Figure 3 |. Resolving the molecular context behind structural variants by integrating multimodal information.
a | Layers of biological data that can be integrated with SV calls to interpret a possible mechanistic chain of events. Each layer possesses quantifiable readouts that can be tested for association with genomic variants. Studies have focused less on the integration with more distal layers, such as the proteome, metabolome and microbiome (later two not shown), but future efforts focused here should have just as much potential to be informative. b | Linked-reads detect tandem duplications upstream of AR. Previous studies showed that this region contains an enhancer (green boxes) for AR which are consistent with DNase hypersensitivity peaks. Hi-C analysis shows that both the enhancer and gene body are located within the same topologically associating domain, further suggesting their interaction. Paired expression data from multiple samples shows that duplication of the enhancer leads to increased AR expression when compared to cases without the duplication. Integration of layered data suggests that tandem duplications cause gain of an enhancer element that drives AR expression in castration-resistant prostate cancer. c | A 3.4 kb deletion was detected by OM and read-depth from short-read HTS. The authors use H3K27ac ChIP-seq data to determine that the deletion overlapped an enhancer element (green) and Hi-C data to determine that the enhancer interacts with an upstream promoter (yellow oval) to regulate GNB4. Comparisons of expression data against HMEC reveals that nearby genes show increased expression but GNB4 expression is notably decreased. This information taken together illustrates that decreased expression of GNB4 may result from deletion of a downstream enhancer in spite of amplification of the gene body.

References

    1. The 1000 Genomes Project Consortium et al. A global reference for human genetic variation. Nature 526, 68–74 (2015). - PMC - PubMed
    1. The Wellcome Trust Case Control Consortium et al. Origins and functional impact of copy number variation in the human genome. Nature 464, 704–712 (2010). - PMC - PubMed
    1. Sudmant PH et al. Diversity of Human Copy Number Variation and Multicopy Genes. Science 330, 641–646 (2010). - PMC - PubMed
    1. Mills RE et al. Mapping copy number variation by population-scale genome sequencing. Nature 470, 59–65 (2011). - PMC - PubMed
    1. The 1000 Genomes Project Consortium et al. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015). - PMC - PubMed

Publication types