. 2018 Jul 15;34(14):2384-2391.

doi: 10.1093/bioinformatics/bty097.

PennDiff: detecting differential alternative splicing and transcription by RNA sequencing

Yu Hu¹, Jennie Lin², Jian Hu¹, Gang Hu³, Kui Wang³, Hanrui Zhang⁴, Muredach P Reilly⁴, Mingyao Li¹

Affiliations

¹ Department of Biostatistics, Epidemiology and Informatics.
² Renal Electrolyte and Hypertension Division, Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA.
³ Department of Information Theory and Data Science, School of Mathematical Sciences, Nankai University, Tianjin, China.
⁴ Division of Cardiology, Department of Medicine, Columbia University Medical Center, New York City, NY, USA.

PMID: 29474557
PMCID: PMC6041879
DOI: 10.1093/bioinformatics/bty097

PennDiff: detecting differential alternative splicing and transcription by RNA sequencing

Yu Hu et al. Bioinformatics. 2018.

. 2018 Jul 15;34(14):2384-2391.

doi: 10.1093/bioinformatics/bty097.

Authors

Yu Hu¹, Jennie Lin², Jian Hu¹, Gang Hu³, Kui Wang³, Hanrui Zhang⁴, Muredach P Reilly⁴, Mingyao Li¹

Affiliations

¹ Department of Biostatistics, Epidemiology and Informatics.
² Renal Electrolyte and Hypertension Division, Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA.
³ Department of Information Theory and Data Science, School of Mathematical Sciences, Nankai University, Tianjin, China.
⁴ Division of Cardiology, Department of Medicine, Columbia University Medical Center, New York City, NY, USA.

PMID: 29474557
PMCID: PMC6041879
DOI: 10.1093/bioinformatics/bty097

Abstract

Motivation: Alternative splicing and alternative transcription are a major mechanism for generating transcriptome diversity. Differential alternative splicing and transcription (DAST), which describe different usage of transcript isoforms across different conditions, can complement differential expression in characterizing gene regulation. However, the analysis of DAST is challenging because only a small fraction of RNA-seq reads is informative for isoforms. Several methods have been developed to detect exon-based and gene-based DAST, but they suffer from power loss for genes with many isoforms.

Results: We present PennDiff, a novel statistical method that makes use of information on gene structures and pre-estimated isoform relative abundances, to detect DAST from RNA-seq data. PennDiff has several advantages. First, grouping exons avoids multiple testing for 'exons' originated from the same isoform(s). Second, it utilizes all available reads in exon-inclusion level estimation, which is different from methods that only use junction reads. Third, collapsing isoforms sharing the same alternative exons reduces the impact of isoform expression estimation uncertainty. PennDiff is able to detect DAST at both exon and gene levels, thus offering more flexibility than existing methods. Simulations and analysis of a real RNA-seq dataset indicate that PennDiff has well-controlled type I error rate, and is more powerful than existing methods including DEXSeq, rMATS, Cuffdiff, IUTA and SplicingCompass. As the popularity of RNA-seq continues to grow, we expect PennDiff to be useful for diverse transcriptomics studies.

Availability and implementation: PennDiff source code and user guide is freely available for download at https://github.com/tigerhu15/PennDiff.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

**Fig. 1.**
Partitioning biological exons into non-overlapping virtual exons in a gene with three isoforms. This gene has 14 virtual exons, of which 9 are alternative spliced or transcribed. These alternative exons can be divided into three exon groups

**Fig. 2.**
Smooth scatter plot of logit transformed estimated exon-inclusion levels versus logit transformed true values. Correlation was calculated on the logit transformed values. (A) Exon-inclusion levels estimated by PennDiff based on RefSeq annotation (8061 alternative splicing or transcription events). (B) Exon-inclusion levels estimated by PennDiff based on Ensembl annotation (49 607 alternative splicing or transcription events)

**Fig. 3.**
Type I error and power of exon-based methods with different sample sizes and gene annotations. Calculations were based on all DAST and non-DAST exons in the input data. Significance was evaluated at the 5% significance level. An exon with true exon-inclusion level difference > $0.1$ was defined as a true DAST exon. (A) 5 versus 5 based on RefSeq annotation. (B) 20 versus 20 based on RefSeq annotation. (C) 5 versus 5 based on Ensembl annotation. (D) 20 versus 20 based on Ensembl annotation

**Fig. 4.**
Type I error and power of gene-based methods with different sample sizes and gene annotations. Calculations were based on all DAST and non-DAST genes in the input data. Significance was evaluated at the 5% significance level. A gene with true Hellinger distance > $0.1$ was defined as a true DAST gene. (A) 5 versus 5 based on RefSeq annotation. (B) 20 versus 20 based on RefSeq annotation. (C) 5 versus 5 based on Ensembl annotation. (D) 20 versus 20 based on Ensembl annotation

**Fig. 5.**
The impact of gene complexity on power of different methods. (A) Power comparison between PennDiff and DEXSeq when results were stratified by the number of exons per group (≥2: 2765 exon groups, ≥3: 1103 exon groups, ≥4: 668 exon groups, ≥5: 460 exon groups, ≥6: 370 exon groups). Significance was evaluated at the 5% level. (B) Power comparison between PennDiff, IUTA and SplicingCompass when results were stratified by the number of isoforms per gene (≥2: 6321 genes, ≥5: 4232 genes, ≥10: 2102 genes, ≥15: 941 genes, ≥20: 426 genes, ≥25: 189 genes). Significance was evaluated at the 5% level

**Fig. 6.**
The impact of mis-annotation of isoforms on power of different methods. (A) Evaluation of the impact of under-annotation of isoforms. Shown are the power estimates of PennDiff, IUTA and SplicingCompass based on 100% (true), 90% (10% less), 75% (25% less) and 50% (50% less) of the Ensembl annotated isoforms. (B) Evaluation of the impact of over-annotation of isoforms. Shown are the power estimates of PennDiff, IUTA and SplicingCompass based on 66% (true), 73% (10% more), 83% (25% more) and 100% (50% more) of the Ensembl annotated isoforms

**Fig. 7.**
(A) DAST genes detected by different methods for human induced pluripotent stem cells (iPSCs) versus iPSC-derived macrophages (iPSDMs). (B) RT-PCR validation of alternatively spliced exon chr11: 5422155–85422275 in *SYTL2* in samples of two human donors we performed the RNA-seq studies. The exon-inclusion levels shown in the table were estimated based on the gel image. (C) IGV sashimi plot of gene *SYTL2*. M4 and M8 are two study subjects

See this image and copyright information in PMC

References

1. Anders S., Huber W. (2010) Differential expression analysis for sequence count data. Genome Biol., 11, R106.. - PMC - PubMed
1. Aschoff M. et al. (2013) SplicingCompass: differential splicing detection using RNA-seq data. Bioinformatics, 29, 1141–1148. - PubMed
1. Griebel T. et al. (2012) Modelling and simulating generic RNA-Seq experiments with the flux simulator. Nucleic Acids Res., 40, 10073–10083. - PMC - PubMed
1. Han J. et al. (2011) Pre-mRNA splicing: where and when in the nucleus. Trends Cell Biol., 21, 336–343. - PMC - PubMed
1. He J. et al. (2012) A Gaussian copula approach for the analysis of secondary phenotypes in case-control genetic association studies. Biostatistics, 13, 497–508. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

T32 DK007006/DK/NIDDK NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

PennDiff: detecting differential alternative splicing and transcription by RNA sequencing

Affiliations

PennDiff: detecting differential alternative splicing and transcription by RNA sequencing

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases

Research Materials