Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2018 Aug 10;13(8):e0201554.
doi: 10.1371/journal.pone.0201554. eCollection 2018.

iMapSplice: Alleviating reference bias through personalized RNA-seq alignment

Affiliations
Comparative Study

iMapSplice: Alleviating reference bias through personalized RNA-seq alignment

Xinan Liu et al. PLoS One. .

Abstract

Genomic variants in both coding and non-coding sequences can have functionally important and sometimes deleterious effects on exon splicing of gene transcripts. For transcriptome profiling using RNA-seq, the accurate alignment of reads across exon junctions is a critical step. Existing algorithms that utilize a standard reference genome as a template sometimes have difficulty in mapping reads that carry genomic variants. These problems can lead to allelic ratio biases and the failure to detect splice variants created by splice site polymorphisms. To improve RNA-seq read alignment, we have developed a novel approach called iMapSplice that enables personalized mRNA transcriptome profiling. The algorithm makes use of personal genomic information and performs an unbiased alignment towards genome indices carrying both reference and alternative bases. Importantly, this breaks the dependency on reference genome splice site dinucleotide motifs and enables iMapSplice to discover personal splice junctions created through splice site polymorphisms. We report comparative analyses using a number of simulated and real datasets. Besides general improvements in read alignment and splice junction discovery, iMapSplice greatly alleviates allelic ratio biases and unravels many previously uncharacterized splice junctions created by splice site polymorphisms, with minimal overhead in computation time and storage. Software download URL: https://github.com/LiuBioinfo/iMapSplice.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. An overview of iMapSplice algorithm.
(A) An example illustrating the challenge when mapping a RNA-seq read to the reference genome in the presence of SNPs. (B) An example illustrating how iMapSplice algorithm may resolve spliced alignment with SNPs as well as the basic steps of the alignment.
Fig 2
Fig 2. Aggregated reference allelic ratio distribution of different methods on RNA-seq datasets from individuals NA12812, NA12749, NA07056, NA06994, and NA12275.
(A) Comparison between iMapSplice-phased and “Ref-based” methods (MapSplice, HISAT2, and STAR); (B) Comparison between iMapSplice-phased and “Mask-based” methods (MapSplice MASK, HISAT2 MASK, and STAR MASK); (C) Comparison between two variants of iMapSplice and “SNP-aware” methods (HISAT2 SNP and HISAT2 POP); (D) Summarized comparison in terms of mean and skewness for reference allelic ratio distributions, and number of SNPs covered by at least ten reads.
Fig 3
Fig 3. Examples of splice site polymorphisms.
The solid lines with numbers indicate splice junctions with a polymorphism in the donor or acceptor site, while the dashed lines with numbers indicate other “normal” splice junctions that share either the donor or acceptor sites with the polymorphic splice junctions, but have no polymorphisms. The vertical solid red lines indicate the splice sites where the polymorphism converts a noncanonical splice site to a canonical site, while the vertical solid blue lines indicate the splice sites where the polymorphism converts a canonical splice site to a noncanonical splice site. The vertical dashed gray lines indicate the unshared splice sites of “normal” splice junctions. Bases in black are the reference nucleotides, while those in red are alternate bases at SNP positions. The numbers along the splice junctions are the supporting read counts in the specific RNA-seq sample. The five RNA-seq samples used to demonstrate the examples are from individuals NA12812, NA12749, NA07056, NA12275, and NA06994. (A) and (B) are examples of splice site polymorphisms that enhance splicing through the creation of canonical splice site dinucleotide; (C) is the example of splice site polymorphism that disables splicing as a result of the loss of canonical splice site dinucleotide.

References

    1. Cancer Genome Atlas Research Network. Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature. 2013;499: 43–9. 10.1038/nature12222 - DOI - PMC - PubMed
    1. Cancer Genome Atlas Research Network. Comprehensive genomic characterization of squamous cell lung cancers. Nature. 2012;489: 519–25. 10.1038/nature11404 - DOI - PMC - PubMed
    1. The 1000 Genomes Project Consortium. A global reference for human genetic variation [Internet]. Nature. 2015. pp. 68–74. 10.1038/nature15393 - DOI - PMC - PubMed
    1. Macaulay IC, Haerty W, Kumar P, Li YI, Hu TX, Teng MJ, et al. G&T-seq: parallel sequencing of single-cell genomes and transcriptomes. Nat Methods. 2015;12: 519–522. 10.1038/nmeth.3370 - DOI - PubMed
    1. Trapnell C, Pachter L, Salzberg SL. TopHat: Discovering splice junctions with RNA-Seq. Bioinformatics. 2009;25: 1105–1111. 10.1093/bioinformatics/btp120 - DOI - PMC - PubMed

Publication types

LinkOut - more resources