Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Oct 27;4(10):e260.
doi: 10.1038/mtna.2015.32.

Advanced Characterization of DNA Molecules in rAAV Vector Preparations by Single-stranded Virus Next-generation Sequencing

Affiliations

Advanced Characterization of DNA Molecules in rAAV Vector Preparations by Single-stranded Virus Next-generation Sequencing

Emilie Lecomte et al. Mol Ther Nucleic Acids. .

Abstract

Recent successful clinical trials with recombinant adeno-associated viral vectors (rAAVs) have led to a renewed interest in gene therapy. However, despite extensive developments to improve vector-manufacturing processes, undesirable DNA contaminants in rAAV preparations remain a major safety concern. Indeed, the presence of DNA fragments containing antibiotic resistance genes, wild-type AAV, and packaging cell genomes has been found in previous studies using quantitative polymerase chain reaction (qPCR) analyses. However, because qPCR only provides a partial view of the DNA molecules in rAAV preparations, we developed a method based on next-generation sequencing (NGS) to extensively characterize single-stranded DNA virus preparations (SSV-Seq). In order to validate SSV-Seq, we analyzed three rAAV vector preparations produced by transient transfection of mammalian cells. Our data were consistent with qPCR results and showed a quasi-random distribution of contaminants originating from the packaging cells genome. Finally, we found single-nucleotide variants (SNVs) along the vector genome but no evidence of large deletions. Altogether, SSV-Seq could provide a characterization of DNA contaminants and a map of the rAAV genome with unprecedented resolution and exhaustiveness. We expect SSV-Seq to pave the way for a new generation of quality controls, guiding process development toward rAAV preparations of higher potency and with improved safety profiles.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Overview of SSV-Seq protocol. (a) A quantity of 2 × 1011 vector genomes of a purified rAAV preparation are required as input. (b) To reduce the amount of nonencapsidated DNA, eventually, a DNase digestion step can be performed, with a mix of highly efficient DNases (Baseline-ZERO and Plasmid-Safe DNases). (c) After DNA extraction, total DNA is denatured, and the second strand is synthesized by random priming with the high-fidelity Escherichia coli DNA Pol I, followed by a purification step. (d) The dsDNA template is sheared by sonication into 200–300 bp fragments, which are subsequently end-repaired and A-tailed to allow for the ligation of adaptors compatible with Illumina sequencing. One of the two adaptors contains a short DNA barcode, also called an “index”, which is different for each experimental sample. Finally, an optimized PCR amplification is performed. All of the library preparation steps are checked by chip electrophoresis. (e) A qPCR-based quantification of next-generation sequencing libraries is performed prior to pooling for cluster generation on flow cell in the presence of nonindexed ϕ-X DNA. High-throughput sequencing is achieved on an Illumina HiSeq platform (Rapid Run 2 × 101 bp). Finally, ContaVect performs automated bioinformatic analyses, including preprocessing of reference and sequencing reads, attribution of reads to a reference sequence and postprocessing, resulting in several simple reports.
Figure 2
Figure 2
Correlation of the percentages of DNA populations in rAAV preparations obtained by next-generation sequencing (NGS) and inferred from quantitative polymerase chain reaction (qPCR) data. The data obtained for the three references (rAAV genome, backbone of the vector plasmid, and helper plasmid) detected in both NGS and qPCR are clustered in gray squares crossed by diagonal lines, indicating the perfect correlation between the two methods. The x- and y-axes are symmetrical, represented on log scales and truncated between 0.1–0.2 and 10–90 to highlight the intersample variability. Each point corresponds to the average percentage of two technical replicates for both NGS and qPCR. The correlation between the methods was evaluated with the nonparametric Spearman's test because of the small sample size (18 pairs) and the absence of information about the distribution of the measured variables (Spearman's correlation coefficient, two-tailed P value, 95% confidence interval).
Figure 3
Figure 3
Distribution of DNA contaminants from human chromosomes. (a) Density of reads mapped per chromosome and mitochondrial DNA (mtDNA) obtained after normalization to the read count of the internal normalizer, which contained sonicated DNA extracted from HEK-293 cells. A value of 1 indicates a random distribution, 2 indicates twofold enrichment, and 0.5 indicates twofold depletion. Each point is the average value of the two technical replicates. The “Other” category aggregates results obtained for 169 regions of the GRCh38 primary assembly that are not assembled into chromosomes. (b) Depth of coverage along the mitochondrial D-loop (human genome GRCh38 MT: 16,078—16,561) for rAAV purified by CsCl and for the negative control. These data were confirmed by a D-loop-specific qPCR. Values in copies/ng of DNA are indicated for the corresponding samples on the right of the coverage graph, and the positions of the qPCR primers are represented below the graphs by black arrows. (c) Depth of coverage over a gene locus from chromosome 15 for rAAV purified with AVB columns and for the negative control. The locus is not disclosed due to confidentiality concerns.
Figure 4
Figure 4
Sequencing coverage and percentage of single nucleotide variants along the rAAV genome. (a) Sequencing coverage along each base of the rAAV CMVp-eGFP-hygroTK-bGHpA genome. To compare samples independently of their sequencing depth, a normalized depth of coverage was computed by counting the number of reads aligned to each base (×1,000), divided by the sum of coverage for all bases mapped along the rAAV genome. Lines correspond to the average normalized coverage of the two technical replicates for the rAAV preparations, purified by CsCl (red), IEX (green), AVB (blue), and the internal normalizer control (black), without DNase treatment. The gray area below the graph represents the normalized coverage of the in silico-generated control. The shoulders at the extremities correspond to the range of artificial fragmentation specified in the program that generates the artificial Fastq datasets (250–450 bases). (b) Cumulative percentage of alternative base A (red), C (blue), T (green), and G (brown) compared with the reference sequence, i.e., single-nucleotide variants. When several variants were found at the same nucleotide position, variant contributions were stacked. SNVs are represented on the graph if they were found in at least half of all of the experimental samples. (c) Map and length of the rAAV genome, represented to scale below the graphs, with coordinates in base pairs. CMVp, cytomegalovirus promoter; eGFP, enhanced green fluorescent protein CDS; HygroTK, hygromycin-thymidine kinase fusion CDS; IRES, internal ribosome entry site; ITR, inverted terminal repeat; pA, bovine growth hormone polyadenylation signal.
Figure 5
Figure 5
Sequencing depth along the rAAV vector plasmid and visualization of ITR/backbone junctions. All of the sequencing reads were realigned on a single vector plasmid reference sequence composed of the rAAV genome and the plasmid backbone, as indicated in the inner circle. Each lane of the circular histograms corresponds to the average values obtained for the two technical replicates of the in silico control (1, black) and for the rAAV preparations purified by IEX (2, green), AVB (3, blue), and CsCl (4, red), without DNase treatment. (a) The depth of coverage for each position was normalized to the total number of nucleotides aligned along the full plasmid reference. Panels (b) and (c) represent enlarged images of the left and right junctions between the rAAV genome and plasmid backbone, respectively. The number of reads overlapping at least 20 nt on each side of both junctions was evaluated for each sample using a dedicated bioinformatic tool. The numbers of reads and corresponding proportions are indicated for both technical replicates.

References

    1. Pierce, EA and Bennett, J (2015). The status of RPE65 gene therapy trials: safety and efficacy. Cold Spring Harb Perspect Med 5: 9. - PMC - PubMed
    1. Nathwani, AC, Tuddenham, EG, Rangarajan, S, Rosales, C, McIntosh, J, Linch, DC et al. (2011). Adenovirus-associated virus vector-mediated gene transfer in hemophilia B. N Engl J Med 365: 2357–2365. - PMC - PubMed
    1. Gao, K, Li, M, Zhong, L, Su, Q, Li, J, Li, S et al. (2014). Empty virions in AAV8 vector preparations reduce transduction efficiency and may cause total viral particle dose-limiting side-effects. Mol Ther Methods Clin Dev 1: 20139. - PMC - PubMed
    1. Allen, JM, Debelak, DJ, Reynolds, TC and Miller, AD (1997). Identification and elimination of replication-competent adeno-associated virus (AAV) that can arise by nonhomologous recombination during AAV vector production. J Virol 71: 6816–6822. - PMC - PubMed
    1. Dong, B, Duan, X, Chow, HY, Chen, L, Lu, H, Wu, W et al. (2014). Proteomics analysis of co-purifying cellular proteins associated with rAAV vectors. PLoS One 9: e86453. - PMC - PubMed