Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jul 15;35(14):i225-i232.
doi: 10.1093/bioinformatics/btz346.

Alignment-free filtering for cfNA fusion fragments

Affiliations

Alignment-free filtering for cfNA fusion fragments

Xiao Yang et al. Bioinformatics. .

Abstract

Motivation: Cell-free nucleic acid (cfNA) sequencing data require improvements to existing fusion detection methods along multiple axes: high depth of sequencing, low allele fractions, short fragment lengths and specialized barcodes, such as unique molecular identifiers.

Results: AF4 was developed to address these challenges. It uses a novel alignment-free kmer-based method to detect candidate fusion fragments with high sensitivity and orders of magnitude faster than existing tools. Candidate fragments are then filtered using a max-cover criterion that significantly reduces spurious matches while retaining authentic fusion fragments. This efficient first stage reduces the data sufficiently that commonly used criteria can process the remaining information, or sophisticated filtering policies that may not scale to the raw reads can be used. AF4 provides both targeted and de novo fusion detection modes. We demonstrate both modes in benchmark simulated and real RNA-seq data as well as clinical and cell-line cfNA data.

Availability and implementation: AF4 is open sourced, licensed under Apache License 2.0, and is available at: https://github.com/grailbio/bio/tree/master/fusion.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
An overview of AF4 workflow. Detailed descriptions are given in the main text
Fig. 2.
Fig. 2.
Fusion event involving gene pairs (g1, g2) and readpairs (r1, r2). The green line denotes the fusion transcript derived from genes g1 and g2 with fusion junction point denoted by xb. (a) Neither r1 nor r2 spans xb, (b) r1 but not r1 spans xb and (c) both r1 and r2 span xb
Fig. 3.
Fig. 3.
Fragment generation by stitching and overhang trimming of readpair (r1, r2). (a) r1 and r2 are represented as arrows facing each other denoting the forward and reverse complement strands. The green bars denote one of the shared kmers between them, which is an anchor for suffix—prefix alignment. The stitched fragment is a concatenation of prefix of r1, overlap and suffix of r2. (b) When r1 and/or r2 extends beyond the 5 region of the other read, the overhang is trimmed, and f is the overlap. (c) When r1 and r2 cannot be merged, f is a concatenation of r1 and reverse complement of r2
Fig. 4.
Fig. 4.
Computing maximum coverage of fragment f for a gene pair (g1, g2). g1 and g2 are two genes inferred to cover regions of f. g1 covers regions s1,s2,s3, and g2 cover regions s4,s5,s6. [xi,xi) are start and end positions of f for region si

References

    1. Bray N.L. et al. (2016) Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol., 34, 525.. - PubMed
    1. Chen K. et al. (2012) BreakFusion: targeted assembly-based identification of gene fusions in whole transcriptome paired-end sequencing data. Bioinformatics, 28, 1923–1924. - PMC - PubMed
    1. Donaldson J., Park B.H. (2018) Circulating tumor DNA: measurement and clinical utility. Ann. Rev. Med., 69, 223–234. - PubMed
    1. Edgren H. et al. (2011) Identification of fusion genes in breast cancer by paired-end RNA-sequencing. Genome Biol., 12, R6.. - PMC - PubMed
    1. Haas B. et al. (2017). STAR-Fusion: fast and accurate fusion transcript detection from RNA-seq. https://www.biorxiv.org/content/10.1101/120295v1. - DOI

Publication types

Substances

LinkOut - more resources