Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Mar;31(3):448-460.
doi: 10.1101/gr.257246.119. Epub 2021 Jan 13.

Accurate and efficient detection of gene fusions from RNA sequencing data

Affiliations

Accurate and efficient detection of gene fusions from RNA sequencing data

Sebastian Uhrig et al. Genome Res. 2021 Mar.

Abstract

The identification of gene fusions from RNA sequencing data is a routine task in cancer research and precision oncology. However, despite the availability of many computational tools, fusion detection remains challenging. Existing methods suffer from poor prediction accuracy and are computationally demanding. We developed Arriba, a novel fusion detection algorithm with high sensitivity and short runtime. When applied to a large collection of published pancreatic cancer samples (n = 803), Arriba identified a variety of driver fusions, many of which affected druggable proteins, including ALK, BRAF, FGFR2, NRG1, NTRK1, NTRK3, RET, and ROS1. The fusions were significantly associated with KRAS wild-type tumors and involved proteins stimulating the MAPK signaling pathway, suggesting that they substitute for activating mutations in KRAS In addition, we confirmed the transforming potential of two novel fusions, RRBP1-RAF1 and RASGRP1-ATP1A1, in cellular assays. These results show Arriba's utility in both basic cancer research and clinical translation.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Benchmark of Arriba versus alternative methods. (A) Accuracy benchmarks. The figure shows samples from three types of benchmark data set: simulated fusions, spike-ins of synthetic fusions, and fusions described in the MCF-7 breast cancer cell line. The sensitivity/specificity trade-off is depicted using receiver operating characteristic (ROC)–like curves. The vertical axis indicates the number of true positives; the horizontal axis indicates the number of false positives (simulated data set) or nonvalidated predictions (spike-in and MCF-7 data sets). (B) Runtimes. (C) Peak memory consumption in gigabytes (GB). The aligner (STAR) and its index accounted for 31 GB of the memory footprint of Arriba's workflow. Approximately 7 GB were consumed by Arriba (Arr.) itself.
Figure 2.
Figure 2.
Recall of hallmark gene fusions in prostate cancer and diffuse large B-cell lymphoma. To measure the performance of Arriba and alternative methods on real patient data, we counted the number of hallmark gene fusions detected by each method in two cohorts. Fractions marked with an asterisk were only detected when a list of known/expected fusions was provided. (A) TMPRSS2-ERG fusions in the ICGC-EOPC cohort. (B) IG-BCL2/BCL6/MYC fusions in the TCGA-DLBC cohort.
Figure 3.
Figure 3.
Gene fusions in pancreatic cancer. Overview of proteins in the MAPK signaling pathway found to be fused in pancreatic tumors. Colored proteins were fused to one of the genes listed in the callouts. Proteins shown in gray were not found to be fused. The frequencies of recurrent fusion partners are indicated in parentheses. The detailed structure of all fusions is depicted in Supplemental Figure S6.
Figure 4.
Figure 4.
Structural and functional characteristics of RRBP1-RAF1 and RASGRP1-ATP1A1. (A) Structure of the fusion transcripts. (B) Protein domains retained in the fusion proteins and topology. Near full-length RAF1 was found to be fused to the transmembrane protein RRBP1, presumably tethering RAF1 to the endoplasmatic reticulum with its kinase domain facing the cytoplasmic space. The oncogene RASGRP1 was predicted to be fused to ATP1A1, a protein embedded in the plasma membrane. Although oncogenes are more often found to constitute the C terminus of a fusion protein, RASGRP1 appeared to be fused to the N terminus of ATP1A1, thereby replacing several C-terminal domains of RASGRP1, which normally regulate recruitment to the plasma membrane, where RASGRP1 activates its target, KRAS (Beaulieu et al. 2007). Presumably, replacement of these regulatory domains by a membrane-bound protein increased the activity of RASGRP1 by means of warranting proximity to KRAS. (C) MCF10A and H6c7 cells were stably transduced with one of the fusion constructs or empty vector. MCF10A cells were cultured for 8 d without EGF, H6c7 cells were cultured for 7 d with EGF, and the area covered by cells was measured. Statistical significance was tested using a two-sided Welch t-test (MCF10A RASGRP1-ATP1A1: P-value = 0.023; MCF10A RRBP1-RAF1: P-value = 0.0094; H6c7 RASGRP1-ATP1A1: P-value = 4.1×105; H6c7 RRBP1-RAF1: P-value = 0.14). (D) Western blot showing increased phosphorylation of MAP2K1/2 (MEK1/2) and MAPK1/3 (ERK2/1) in TP53-deficient MCF10A cells stably transduced with one of the fusions as compared to empty vector.
Figure 5.
Figure 5.
Arriba workflow. Arriba is an extension of a standard alignment workflow based on STAR. In legacy mode, STAR writes chimeric alignments to the file Chimeric.out.sam. In newer versions, STAR writes them to the main output file Aligned.out.bam. Arriba can take either file as input to search for gene fusions.
Figure 6.
Figure 6.
Covariates used to estimate the level of background noise. One of Arriba's artifact filters removes candidates with fewer supporting reads than the estimated level of background noise. For this purpose, Arriba calculates several covariates that correlate with the level of background noise. (A) Arriba assumes a polynomial relationship between the noise level (unfiltered candidates) and their number of supporting reads. The data shown here are based on the highly expressed housekeeping gene GAPDH in the MCF-7 cell line (SRA accession ERR358487). (B) The figure shows the number of unfiltered candidates as a function of the breakpoint distance averaged over all genes in the MCF-7 cell line. Artifacts tend to have breakpoints in close proximity as evidenced by a sharp increase in the number of candidates with decreasing distance. Arriba fits two models depending on whether the breakpoints are closer or further apart than 400 bp (red and blue lines, respectively). (C) The library preparation method can affect the proportions of artifacts. For example, the samples from Heining et al. (2018) are a mixture of stranded and nonstranded libraries. The stranded libraries are enriched for duplications compared with the nonstranded libraries (two-sided Wilcoxon rank-sum test, P-value = 0.0044).

References

    1. An X, Tiwari AK, Sun Y, Ding PR, Ashby CR Jr, Chen ZS. 2010. BCR-ABL tyrosine kinase inhibitors in the treatment of Philadelphia chromosome positive chronic myeloid leukemia: a review. Leuk Res 34: 1255–1268. 10.1016/j.leukres.2010.04.016 - DOI - PubMed
    1. Aung KL, Fischer SE, Denroche RE, Jang GH, Dodd A, Creighton S, Southwood B, Liang SB, Chadwick D, Zhang A, et al. 2018. Genomics-driven precision medicine for advanced pancreatic cancer: early results from the COMPASS trial. Clin Cancer Res 24: 1344–1354. 10.1158/1078-0432.CCR-17-2994 - DOI - PMC - PubMed
    1. Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S, Wilson CJ, Lehár J, Kryukov GV, Sonkin D, et al. 2012. The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483: 603–607. 10.1038/nature11003 - DOI - PMC - PubMed
    1. Beaulieu N, Zahedi B, Goulding RE, Tazmini G, Anthony KV, Omeis SL, de Jong DR, Kay RJ. 2007. Regulation of RasGRP1 by B cell antigen receptor requires cooperativity between three domains controlling translocation to the plasma membrane. Mol Biol Cell 18: 3156–3168. 10.1091/mbc.e06-10-0932 - DOI - PMC - PubMed
    1. Bhattacharyya S, Pradhan K, Campbell N, Mazdo J, Vasantkumar A, Maqbool S, Bhagat TD, Gupta S, Suzuki M, Yu Y, et al. 2017. Altered hydroxymethylation is seen at regulatory regions in pancreatic cancer and regulates oncogenic pathways. Genome Res 27: 1830–1842. 10.1101/gr.222794.117 - DOI - PMC - PubMed

Publication types