Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jun 11;12(1):3586.
doi: 10.1038/s41467-021-23918-y.

Cas9 targeted enrichment of mobile elements using nanopore sequencing

Affiliations

Cas9 targeted enrichment of mobile elements using nanopore sequencing

Torrin L McDonald et al. Nat Commun. .

Abstract

Mobile element insertions (MEIs) are repetitive genomic sequences that contribute to genetic variation and can lead to genetic disorders. Targeted and whole-genome approaches using short-read sequencing have been developed to identify reference and non-reference MEIs; however, the read length hampers detection of these elements in complex genomic regions. Here, we pair Cas9-targeted nanopore sequencing with computational methodologies to capture active MEIs in human genomes. We demonstrate parallel enrichment for distinct classes of MEIs, averaging 44% of reads on-targeted signals and exhibiting a 13.4-54x enrichment over whole-genome approaches. We show an individual flow cell can recover most MEIs (97% L1Hs, 93% AluYb, 51% AluYa, 99% SVA_F, and 65% SVA_E). We identify seventeen non-reference MEIs in GM12878 overlooked by modern, long-read analysis pipelines, primarily in repetitive genomic regions. This work introduces the utility of nanopore sequencing for MEI enrichment and lays the foundation for rapid discovery of elusive, repetitive genetic elements.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. A schematic Cas9 targeted enrichment and Nano-Pal pipeline for mobile elements using nanopore sequencing.
a Purified genomic DNA (gDNA) is isolated by salting out and then extensively dephosphorylated. Dephosphorylated gDNA is incubated with the Cas9 ribonucleoprotein which is targeted to MEI subfamily-specific sequences near the 3′ end of the element. Taq polymerase (not shown), and dATPs (not shown) monoadenylate DNA ends. b Cas9 cleaved sites are ligated with Oxford Nanopore Technologies (ONT) sequencing adapters and sequenced on a flow cell. Sequencing is bi-directional from the cleavage site. c Nano-Pal scans the nanopore sequencing reads (black bars) after Cas9 enrichment for MEI signal on one or both ends. The yellow bar represents MEI consensus sequence or MEI signals in pairwise comparison of Nano-Pal. d All reads with or without annotated MEI signal are imported into the downstream pipeline. Alignment, classification, and clustering processes are sequentially conducted. Nano-Pal identifies reference and non-reference MEIs followed by the inspection of nanopore-specific non-reference MEIs (see “Methods” section). e Examples illustrating capture and alignment of reads containing non-reference L1Hs signal (top) and reference L1Hs signal (bottom). Aligned reads display a non-reference insertion (top) with L1Hs signal (yellow bar) and flanking genomic sequence (black bar). MEI components of reads in non-reference insertions are displayed as overlapping (soft clipping) due to lack of reference genome MEI annotation (gray bar). Aligned reads display annotated reference L1Hs (bottom, yellow bar), flanked by surrounding genomic sequence (black bar), separated by the Cas9 cleavage site (red triangle). PALMER and RepeatMasker tracks are illustrated in red and blue, respectively.
Fig. 2
Fig. 2. Guide RNA design for MEIs and guide RNA cleavage-site distribution.
a Distributions of candidate guide RNAs (left Y-axis and the histogram) in the L1Hs consensus sequence and structure information. The right Y-axis and the line indicate frequency of corresponding candidates in the reference genome sequence. b Upper panel shows the distribution for AluYb and the lower panel for AluYa. c Upper panel shows the distribution for SVA_F and the lower panel for SVA_E. Red arrows in ac indicate where the selected guide is. d Cleavage-site distribution of all guide RNAs in this project. The x-axis indicates the position where the read ends or begins, with the number depicting the base distance from the PAM site (NGG). The PAM site (NGG) is colored blue and guide RNA bases are highlighted by a rectangle. Bases outside of the guide RNA or the PAM site are colored gray. The y-axis is the number of nanopore reads counted. The upper bar represents reads with forward strand sequencing outward from the 3′ end of the guide RNA (rose arrow), and the lower bar represents reads with reverse strand sequencing outward from the 5′ end of the guide RNA (purple arrow).
Fig. 3
Fig. 3. Systematic evaluation of known MEIs captured by nanopore Cas9 enrichment approach in different flow cells.
a Known L1Hs in GM12878 recovered by Cas9 targeted enrichment from the individual MinION flow cell (FAL11389), pooled-MEI MinION flow cell (FAO84736), and individual Flongle flow cell (ABB607), displayed as a proportion of the upper-bound known reference L1Hs, L1Pa, and other L1 as well as non-reference (non-ref.) L1Hs from the PacBio-MEI set. Non-reference L1Hs were divided into different subfamilies (L1Ta, L1PreTa, and L1Hs with ambiguous subfamilies). Dotted-gray line represents the intermediate values (as proportion) of MEIs that the guide RNA binds when allowing a ≤ 3 bp mismatch or gap. b Number of supporting reads of each captured L1 in the context of a. c Known AluY elements in GM12878 recovered by Cas9 enrichment on one pooled MinION flow cell (FAO84736), one individual AluYb Flongle flow cell (ACK645), and one individual AluYa Flongle flow cell (ACK655). d The number of supporting reads of each captured Alu element in the context of c. e Known SVA elements in GM12878 recovered by Cas9 enrichment on one pooled MinION flow cell (FAO84736), one individual SVA_F Flongle flow cell (ACK629), and one individual SVA_E Flongle flow cell (ACK395). f The number of supporting reads of each captured SVA element in the context of e. g Known L1Hs captured in the GM12878 trio by Cas9 enrichment on one pooled MinION flow cell (FAL15177). h The number of supporting reads of each captured non-reference L1Hs based on transmission in the GM12878 trio. The non-reference L1Hs in the parents (GM12892 and GM12891) were categorized into transmitted and not-transmitted. The non-reference L1Hs in the child (GM12878) were categorized as insertions inherited from GM12892 or GM12891, and from either parents (unknown parental lineage). In b, d, f, h, the numbers of captured MEI subfamily can be found in Supplementary Data 6 with information of mean and standard deviation; The error bars of boxplot range from Q1 − 1.5 IQR to Q3 + 1.5 IQR (IQR, interquartile range) and outliers are not shown.
Fig. 4
Fig. 4. Non-reference MEIs captured by nanopore Cas9 enrichment approach.
a Number of non-reference L1Hs captured by nanopore Cas9 enrichment at different on-target read coverages for different supporting read cutoffs. The dotted-gray line with italic number represents the theoretical number of MEIs that the guide RNA binds when allowing a ≤ 3 bp mismatch or gap in the PacBio-MEI set. b, c Number of non-reference AluYb, AluYa, SVA_F, and SVA_E, respectively, captured by nanopore Cas9 enrichment at different on-target read coverages. Axis labels and theoretic guide number as in a. d An example of non-reference L1Hs specifically captured by nanopore sequencing at chrX:121,709,076. The tracks from top to bottom are as follows: reference coordinates with a red triangle represent the insertion site, gene track, RepeatMasker track (blue bars) with reference element annotation, PacBio contigs assembly for two haplotypes, four nanopore local-assembled contigs by CANU from different classifications of nanopore reads based on insertion signals (contig1, signal on 3′ end; contig2, signal on 5′ end; contig3, signal in the middle of the read; and contig4, no signal). e Recurrence (dot) plots for nanopore contigs versus the reference region chrX:121,708,576-121,7089,576 sequence. Left panel shows the most 3′ end of contig1 and the most 5′ end of contig2 versus the reference sequence. Yellow bar represents the non-reference L1Hs sequence contained in the contig. The red bar represents one side of the target site duplication motif for the non-reference L1Hs contained in the contig. The upper part of this panel demonstrates sequences at the end of two contigs regarding the cleavage site when aligning to the guide RNA sequence. Blue bars in the middle panel represent the RepeatMasker track with reference L1 information annotated, and the red triangle represents the insertion site in the reference L1 region. The right panel shows contig3 versus the reference sequence. Details of this non-reference L1Hs are detailed in the panel, including length, strand, empty site, and endonuclease (EN) cleavage site sequence.

References

    1. Smit AF. Interspersed repeats and other mementos of transposable elements in mammalian genomes. Curr. Opin. Genet. Dev. 1999;9:657–663. doi: 10.1016/S0959-437X(99)00031-3. - DOI - PubMed
    1. Lander ES, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. doi: 10.1038/35057062. - DOI - PubMed
    1. Deininger P. Alu elements: know the SINEs. Genome Biol. 2011;12:236. doi: 10.1186/gb-2011-12-12-236. - DOI - PMC - PubMed
    1. Ostertag EM, Goodier JL, Zhang Y, Kazazian HH., Jr. SVA elements are nonautonomous retrotransposons that cause disease in humans. Am. J. Hum. Genet. 2003;73:1444–1451. doi: 10.1086/380207. - DOI - PMC - PubMed
    1. Kazazian HH, Jr, Moran JV. Mobile DNA in health and disease. N. Engl. J. Med. 2017;377:361–370. doi: 10.1056/NEJMra1510092. - DOI - PMC - PubMed

Publication types