Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 May 3;2(5):100248.
doi: 10.1016/j.patter.2021.100248. eCollection 2021 May 14.

FaNDOM: Fast nested distance-based seeding of optical maps

Affiliations

FaNDOM: Fast nested distance-based seeding of optical maps

Siavash Raeisi Dehkordi et al. Patterns (N Y). .

Abstract

Optical mapping (OM) provides single-molecule readouts of fluorescently labeled sequence motifs on long fragments of DNA, resolved to nucleotide-level coordinates. With the advent of microfluidic technologies for analysis of DNA molecules, it is possible to inexpensively generate long OM data ( > 150 kbp) at high coverage. In addition to scaffolding for de novo assembly, OM data can be aligned to a reference genome for identification of genomic structural variants. We introduce FaNDOM (Fast Nested Distance Seeding of Optical Maps)-an optical map alignment tool that greatly reduces the search space of the alignment process. On four benchmark human datasets, FaNDOM was significantly (4-14×) faster than competing tools while maintaining comparable sensitivity and specificity. We used FaNDOM to map variants in three cancer cell lines and identified many biologically interesting structural variants, including deletions, duplications, gene fusions and gene-disrupting rearrangements. FaNDOM is publicly available at https://github.com/jluebeck/FaNDOM.

Keywords: DSML 3: Development/Pre-production: Data science output has been rolled out/validated across multiple domains/problems.

PubMed Disclaimer

Conflict of interest statement

V.B. is a co-founder, consultant, and SAB member of and has equity interest in Boundless Bio, Inc. (BB) and Digital Proteomics, LLC (DP) and also receives income from DP. The terms of this arrangement have been reviewed and approved by the University of California, San Diego in accordance with its conflict of interest policies.

Figures

Figure 1
Figure 1
FaNDOM performance (Left) Running time; (right) accuracy. The set of true positives (TP) were all mappings identified by at least two of the three methods. Recall = TP/(TP + FN), Precision = TP/(TP + FP).
Figure 2
Figure 2
SV calling performance (A) Comparison of FaNDOM and RefAligner deletion calls on NA12878 against a benchmark dataset from Parikh et al., using the hg19 reference. (B) Comparison of FaNDOM and OMSV deletion calls on NA12878 against a benchmark created using multiple sequencing technologies published in Dixon et al. using the hg38 reference. (C) Insertions identified by FaNDOM for NA12878. The blue region signifies insertion polymorphisms identified by FaNDOM also in the Database of Genomic Variants. (D) Length distribution of FaNDOM insertion calls for NA12878. (E) Length distribution of FaNDOM and benchmark deletion calls (Parikh et al.29). (F) A FaNDOM deletion not in the Parikh et al. benchmark dataset likely due to its presence in a low mappability region. (G) A FaNDOM alignment using assembled OM contigs that chains multiple breakpoints across 400 kbp on the K562 cell line. OM alignment visualizations were generated with MapOptics.
Figure 3
Figure 3
Examples of detected structural variants in cancer cell lines (A) The chr8-chr12 translocation shows the integration of an Myc carrying ecDNA molecule onto chr12 in H460. (B) A RACGAP1-AKAP6 fusion on CAKI-2. (C) The BCR-ABL1 fusion on K562. (D) Deletion of the genes ORC6 and MYLK3 with a partial inversion. (E) A translocation that disrupts CDC25A and GRID1 but the direction is inconsistent with a fusion event. (F) A “fold-back” inversion that duplicates and inverts GPC5 in K562.
Figure 4
Figure 4
The FaNDOM workflow (A) Search-and-merge filtering step in which genomic distances extracted from windows (Wa, Wb) and added to lists LM and LN. The lists LM and LN are merged and seeds are identified. (B) Packing seeds into bands step, in which for each band B seeds inside it are formed into a directed acyclic graph, G, and the band is scored by finding the shortest path from s to t. (C) Different score threshold possibilities for the band score distribution of bands for a single query. The best score is denoted as “BS.” (D) Dynamic programming for the alignment module in FaNDOM. (E) Seed selection for partial alignment, which scores bands based on the shortest path between each pair of seeds inside the band B. (F) SV detection module, which finds breakpoints based on multiple partial alignments. The alignment on top shows a breakpoint from A to B, the lower alignment visualizes an inversion, or “fold-back.”

References

    1. Schwartz D.C., Li X., Hernandez L.I., Ramnarain S.P., Huff E.J., Wang Y.K. Ordered restriction maps of Saccharomyces cerevisiae chromosomes constructed by optical mapping. Science. 1993;262:110–114. - PubMed
    1. Botstein D., White R.L., Skolnick M., Davis R.W. Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am. J. Hum. Genet. 1980;32:314–331. - PMC - PubMed
    1. Lam E.T., Hastie A., Lin C., Ehrlich D., Das S.K., Austin M.D., Deshpande P., Cao H., Nagarajan N., Xiao M., Kwok P.Y. Genome mapping on nanochannel arrays for structural variation analysis and sequence assembly. Nat. Biotechnol. 2012;30:771–776. - PMC - PubMed
    1. Chen P., Jing X., Ren J., Cao H., Hao P., Li X. Modelling BioNano optical data and simulation study of genome map assembly. Bioinformatics. 2018;34:3966–3974. - PMC - PubMed
    1. Zhou S., Wei F., Nguyen J., Bechner M., Potamousis K., Goldstein S., Pape L., Mehan M.R., Churas C., Pasternak S. A single molecule scaffold for the maize genome. PLoS Genet. 2009;5:e1000711. - PMC - PubMed

LinkOut - more resources