Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018:601:111-144.
doi: 10.1016/bs.mie.2017.11.028. Epub 2018 Feb 21.

High-Throughput Analysis of DNA Break-Induced Chromosome Rearrangements by Amplicon Sequencing

Affiliations

High-Throughput Analysis of DNA Break-Induced Chromosome Rearrangements by Amplicon Sequencing

Alexander J Brown et al. Methods Enzymol. 2018.

Abstract

The mechanistic understanding of how DNA double-strand breaks (DSB) are repaired is rapidly advancing in part due to the advent of inducible site-specific break model systems as well as the employment of next-generation sequencing (NGS) technologies to sequence repair junctions at high depth. Unfortunately, the sheer volume of data produced by these methods makes it difficult to analyze the structure of repair junctions manually or with other general-purpose software. Here, we describe methods to produce amplicon libraries of DSB repair junctions for sequencing, to map the sequencing reads, and then to use a robust, custom python script, Hi-FiBR, to analyze the sequence structure of mapped reads. The Hi-FiBR analysis processes large data sets quickly and provides information such as number and type of repair events, size of deletion, size of insertion and inserted sequence, microhomology usage, and whether mismatches are due to sequencing error or biological effect. The analysis also corrects for common alignment errors generated by sequencing read mapping tools, allowing high-throughput analysis of DSB break repair fidelity to be accurately conducted regardless of which suite of NGS analysis software is available.

Keywords: Alternative end joining; Amplicon; DNA double-strand break; Hi-FiBR; High-throughput sequencing; Homologous recombination; Microhomology; Nonhomologous end joining; Read alignment; Rearrangement; Repair junction.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Steps in building and sequencing an amplicon library of DSB repair events. First, linear or circular DNA with an inducible break site must undergo a break and be given time for repair. The sequence surrounding the break site (shown in green) may experience an error such as an insertion, deletion, or base substitution (shown in red). To capture the frequency and variety of these errors the sequence surrounding the break site is PCR amplified, optionally with primers that contain barcodes. The amplicon products are then sequenced, and the sequenced reads are mapped to the reference sequence. Finally, the mapped reads are analyzed by the Hi-FiBR analysis.
Fig. 2
Fig. 2
Determining the processing window of a DSB repair event. Standard alignment tools will often treat base substitutions as sequencing error, which causes them to be called matches to the reference sequence. The Hi-FiBR analysis can recognize mismatches in a user-defined window around the break site (here shown as 4 bp on either side) and adjust the matching sequences. This also changes what the program considers as the processing window for repair event classification.
Fig. 3
Fig. 3
Distribution of base substitutions in Illumina sequencing DSB repair amplicons. The distribution of mismatched bp is given for reads that were considered perfect repair events by Geneious, before and after the 8 bp search for base substitutions correction. The majority of mismatches occur homogenously across the read, as expected of sequencing error, with counts of ~10–100. However, the mismatches peak around the break site with a maximum of 11,982 immediately at the break site (~120–1200-fold higher than the surrounding sequence). The 8 bp processing window correction recategorizes these counts and leaves the rest that are likely sequencing error. No mismatches occur at the ends of the reads due to the “Trim_and_Pad” script used to filter reads.
Fig. 4
Fig. 4
Removing sequencing error by generating reconstructed reads. Sequencing error in the flanks of reads around the break site causes the same type of repaired read to appear as a different repair event. To correct for this discrepancy the reads are reconstructed in the regions of matching sequence to become the same as the reference sequence, thus creating a common, reconstructed read.
Fig. 5
Fig. 5
Determination of microhomology usage. For reads classified as deletions, sequence microhomologies may mediate the repair event. These microhomologies are detected by examining the sequence to the left of the deletion in the read, and the sequence inside the deletion on the right side, as shown here. The same process is repeated for the sequence to the right of the deletion in the read (not shown here). The microhomologies detected are then combined into a single sequence.
Fig. 6
Fig. 6
Overview of the data set’s complexity. The data set used here mostly consisted of deletion events, followed by insertion events, complex events, and finally exact matches (i.e., error-free repair). A cursory examination shows that in each event there is a diverse number of repair events (based on size of insertion or deletion) and the sizes tend to follow a normal distribution from smallest size to largest (although the scale here is logarithmic and, consequently, is not immediately obvious as normal).
Fig. 7
Fig. 7
Comparison of NGS alignment softwares for analyzing DSB repair junctions. Example Pearson correlations are given for Geneious, CLC, Bowtie2, and BWA. After processing window correction the aligning tools are more similar. However, there are still many reads that Geneious calls that tools, like CLC, do not catch, as shown by the many 0 reads. This is unsurprising, as CLC retained fewer reads than Geneious.
Fig. 8
Fig. 8
Distribution of repair classes identified by Hi-FiBR analysis of an example data set. The different aligners have the same distribution of repair class types (P = 0.9778). For our data set, deletion events occurred the most frequently, followed by insertions, complex events, and exact matches (i.e., error-free repair).
Fig. 9
Fig. 9
Distribution of observed deletion events based on deletion size. The different aligners produced different distributions of deletion sizes (P < 0.0001). Geneious and BWA called larger deletion sizes. However, they are a small percentage of the total deletions.
Fig. 10
Fig. 10
Distribution of observed insertion events based on insertion size. The different aligners produce statistically different distributions of insertion sizes (P = 0.0098). Manual observation shows that Geneious, BWA, and Bowtie2 called larger insertion sizes, but only a small percent of the reads are larger.
Fig. 11
Fig. 11
Distribution of observed complex repair events based on insertion size. The complex insertion sizes are the same for all alignment tools (P = 0.4905), unlike the deletion and insertion sizes.
Fig. 12
Fig. 12
Distribution of microhomology size observed in deletion events. The size of the microhomologies that mediate deletion events was the same for all aligners (P = 0.9207). The sizes tended to be small (0–3 bp), but this would likely shift depending on the sequence context around the break site.
Fig. 13
Fig. 13
Microhomology sequence usage. The sequence of the microhomologies used differs between all alignment tools (P = 0.0108). Since Geneious and BWA detect events with larger deletion sizes, the difference in microhomologies are likely linked to the larger deletions.

Similar articles

Cited by

References

    1. Aparicio T, Baer R, & Gautier J (2014). DNA double-strand break repair pathway choice and cancer. DNA Repair (Amst), 19, 169–175. 10.1016/j.dnarep.2014.03.014. - DOI - PMC - PubMed
    1. Arlt MF, Casper AM, & Glover TW (2003). Common fragile sites. Cytogenetic and Genome Research, 100(1–4), 92–100. 10.1159/000072843. - DOI - PubMed
    1. Beagan K, Armstrong RL, Witsell A, Roy U, Renedo N, Baker AE, et al. (2017). Drosophila DNA polymerase theta utilizes both helicase-like and polymerase domains during microhomology-mediated end joining and interstrand crosslink repair. PLoS Genetics, 13(5), e1006813. 10.1371/journal.pgen.1006813. - DOI - PMC - PubMed
    1. Bennett CB, Lewis AL, Baldwin KK, & Resnick MA (1993). Lethality induced by a single site-specific double-strand break in a dispensable yeast plasmid. Proceedings of the National Academy of Sciences of the United States of America, 90(12), 5613–5617. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/8516308. - PMC - PubMed
    1. Byrne M, Wray J, Reinert B, Wu Y, Nickoloff J, Lee SH, et al. (2014). Mechanisms of oncogenic chromosomal translocations. Annals of the New York Academy of Sciences, 1310, 89–97. 10.1111/nyas.12370. - DOI - PubMed

Publication types

LinkOut - more resources