Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation
- PMID: 27669167
- PMCID: PMC5143225
- DOI: 10.1038/nbt.3682
Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation
Abstract
We find that current computational methods for estimating transcript abundance from RNA-seq data can lead to hundreds of false-positive results. We show that these systematic errors stem largely from a failure to model fragment GC content bias. Sample-specific biases associated with fragment sequence features lead to misidentification of transcript isoforms. We introduce alpine, a method for estimating sample-specific bias-corrected transcript abundance. By incorporating fragment sequence features, alpine greatly increases the accuracy of transcript abundance estimates, enabling a fourfold reduction in the number of false positives for reported changes in expression compared with Cufflinks. Using simulated data, we also show that alpine retains the ability to discover true positives, similar to other approaches. The method is available as an R/Bioconductor package that includes data visualization tools useful for bias discovery.
Figures
References
-
- Trapnell Cole, Williams Brian A., Pertea Geo, Mortazavi Ali, Kwan Gordon, van Baren Marijke J., Salzberg Steven L., Wold Barbara J., Pachter Lior. Transcript assembly and quantification by RNA-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 2010;28(5):511–515. - PMC - PubMed
-
- 't Hoen Peter A. C., Friedlander Marc R., Almlof Jonas, Sammeth Michael, Pulyakhina Irina, Anvar Seyed Y., Laros Jeroen F. J., Buermans Henk P. J., Karlberg Olof, Brannvall Mathias, van Ommen Gert-Jan B., Estivill Xavier, Guigo Roderic, Syvanen Ann-Christine, Gut Ivo G., Dermitzakis Emmanouil T., Antonarakis Stylianos E., Brazma Alvis, Flicek Paul, Schreiber Stefan, Rosenstiel Philip, Meitinger Thomas, Strom Tim M., Lehrach Hans, Sudbrak Ralf, Carracedo Angel, 't Hoen Peter A. C., Pulyakhina Irina, Anvar Seyed Y., Laros Jeroen F. J., Buermans Henk P. J., van Iterson Maarten, Friedlander Marc R., Monlong Jean, Lizano Esther, Bertier Gabrielle, Ferreira Pedro G., Sammeth Michael, Almlof Jonas, Karlberg Olof, Brannvall Mathias, Ribeca Paolo, Griebel Thasso, Beltran Sergi, Gut Marta, Kahlem Katja, Lappalainen Tuuli, Giger Thomas, Ongen Halit, Padioleau Ismael, Kilpinen Helena, Gonzalez-Porta Mar, Kurbatova Natalja, Tikhonov Andrew, Greger Liliana, Barann Matthias, Esser Daniela, Hasler Robert, Wieland Thomas, Schwarzmayr Thomas, Sultan Marc, Amstislavskiy Vyacheslav, den Dunnen Johan T., van Ommen Gert-Jan B., Gut Ivo G., Guigo Roderic, Estivill Xavier, Syvanen Ann-Christine, Dermitzakis Emmanouil T., Lappalainen Tuuli. Reproducibility of high-throughput mRNA and small RNA sequencing across laboratories. Nat Biotechnol. 2013;31(11):1015–1022. - PubMed
-
- Su Zhenqiang, Labaj Pawel P., Li Sheng, Thierry-Mieg Jean, Thierry-Mieg Danielle, Shi Wei, Wang Charles, Schroth Gary P., Setterquist Robert A., Thompson John F., Jones Wendell D., Xiao Wenzhong, Xu Weihong, Jensen Roderick V., Kelly Reagan, Xu Joshua, Conesa Ana, Furlanello Cesare, Gao Hanlin, Hong Huixiao, Jafari Nadereh, Letovsky Stan, Liao Yang, Lu Fei, Oakeley Edward J., Peng Zhiyu, Praul Craig A., Santoyo-Lopez Javier, Scherer Andreas, Shi Tieliu, Smyth Gordon K., Staedtler Frank, Sykacek Peter, Tan Xin-Xing, Aubrey Thompson E, Vandesompele Jo, Wang May D., Wang Jian, Wolfinger Russell D., Zavadil Jiri, Auerbach Scott S., Bao Wenjun, Binder Hans, Blomquist Thomas, Brilliant Murray H., Bushel Pierre R., Cai Weimin, Catalano Jennifer G., Chang Ching-Wei, Chen Tao, Chen Geng, Chen Rong, Chierici Marco, Chu Tzu-Ming, Clevert Djork-Arne, Deng Youping, Derti Adnan, Devanarayan Viswanath, Dong Zirui, Dopazo Joaquin, Du Tingting, Fang Hong, Fang Yongxiang, Fasold Mario, Fernandez Anita, Fischer Matthias, Furio-Tari Pedro, Fuscoe James C., Caimet Florian, Gaj Stan, Gandara Jorge, Gao Huan, Ge Weigong, Gondo Yoichi, Gong Binsheng, Gong Meihua, Gong Zhuolin, Green Bridgett, Guo Chao, Guo Lei, Guo Li-Wu, Hadfield James, Hellemans Jan, Hochreiter Sepp, Jia Meiwen, Jian Min, Johnson Charles D., Kay Suzanne, Kleinjans Jos, Lababidi Samir, Levy Shawn, Li Quan-Zhen, Li Li, Li Li, Li Peng, Li Yan, Li Haiqing, Li Jianying, Li Shiyong, Lin Simon M., Lopez Francisco J., Lu Xin, Luo Heng, Ma Xiwen, Meehan Joseph, Megherbi Dalila B., Mei Nan, Mu Bing, Ning Baitang, Pandey Akhilesh, Perez-Florido Javier, Perkins Roger G., Peters Ryan, Phan John H., Pirooznia Mehdi, Qian Feng, Qing Tao, Rainbow Lucille, Rocca-Serra Philippe, Sambourg Laure, Sansone Susanna-Assunta, Schwartz Scott, Shah Ruchir, Shen Jie, Smith Todd M., Stegle Oliver, Stralis-Pavese Nancy, Stupka Elia, Suzuki Yutaka, Szkotnicki Lee T., Tinning Matthew, Tu Bimeng, van Delft Joost, Vela-Boza Alicia, Venturini Elisa, Walker Stephen J., Wan Liqing, Wang Wei, Wang Jinhui, Wang Jun, Wieben Eric D., Willey James C., Wu Po-Yen, Xuan Jiekun, Yang Yong, Ye Zhan, Yin Ye, Yu Ying, Yuan Yate-Ching, Zhang John, Zhang Ke K., Zhang Wenqian, Zhang Wenwei, Zhang Yanyan, Zhao Chen, Zheng Yuanting, Zhou Yiming, Zumbo Paul, Tong Weida, Kreil David P., Mason Christopher E., Shi Leming. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the sequencing quality control consortium. Nat Biotechnol. 2014;32(9):903–914. - PMC - PubMed
-
- Li Sheng, Labaj Pawel P., Zumbo Paul, Sykacek Peter, Shi Wei, Shi Leming, Phan John, Wu Po-Yen, Wang May, Wang Charles, Thierry-Mieg Danielle, Thierry-Mieg Jean, Kreil David P., Mason Christopher E. Detecting and correcting systematic variation in large-scale RNA sequencing data. Nat Biotechnol. 2014;32(9):888–895. - PMC - PubMed
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases
Miscellaneous
