Unifying cancer and normal RNA sequencing data from different sources
- PMID: 29664468
- PMCID: PMC5903355
- DOI: 10.1038/sdata.2018.61
Unifying cancer and normal RNA sequencing data from different sources
Abstract
Driven by the recent advances of next generation sequencing (NGS) technologies and an urgent need to decode complex human diseases, a multitude of large-scale studies were conducted recently that have resulted in an unprecedented volume of whole transcriptome sequencing (RNA-seq) data, such as the Genotype Tissue Expression project (GTEx) and The Cancer Genome Atlas (TCGA). While these data offer new opportunities to identify the mechanisms underlying disease, the comparison of data from different sources remains challenging, due to differences in sample and data processing. Here, we developed a pipeline that processes and unifies RNA-seq data from different studies, which includes uniform realignment, gene expression quantification, and batch effect removal. We find that uniform alignment and quantification is not sufficient when combining RNA-seq data from different sources and that the removal of other batch effects is essential to facilitate data comparison. We have processed data from GTEx and TCGA and successfully corrected for study-specific biases, enabling comparative analysis between TCGA and GTEx. The normalized datasets are available for download on figshare.
Conflict of interest statement
The authors declare no competing financial interests.
Figures




Similar articles
-
Processing and Analysis of RNA-seq Data from Public Resources.Methods Mol Biol. 2021;2243:81-94. doi: 10.1007/978-1-0716-1103-6_4. Methods Mol Biol. 2021. PMID: 33606253 Review.
-
Possible Human Papillomavirus 38 Contamination of Endometrial Cancer RNA Sequencing Samples in The Cancer Genome Atlas Database.J Virol. 2015 Sep;89(17):8967-73. doi: 10.1128/JVI.00822-15. Epub 2015 Jun 17. J Virol. 2015. PMID: 26085148 Free PMC article.
-
A comprehensive next generation sequencing-based virome assessment in brain tissue suggests no major virus - tumor association.Acta Neuropathol Commun. 2016 Jul 11;4(1):71. doi: 10.1186/s40478-016-0338-z. Acta Neuropathol Commun. 2016. PMID: 27402152 Free PMC article.
-
QuickRNASeq lifts large-scale RNA-seq data analyses to the next level of automation and interactive visualization.BMC Genomics. 2016 Jan 8;17:39. doi: 10.1186/s12864-015-2356-9. BMC Genomics. 2016. PMID: 26747388 Free PMC article.
-
Opportunities and methods for studying alternative splicing in cancer with RNA-Seq.Cancer Lett. 2013 Nov 1;340(2):179-91. doi: 10.1016/j.canlet.2012.11.010. Epub 2012 Nov 27. Cancer Lett. 2013. PMID: 23196057 Review.
Cited by
-
Network modeling links kidney developmental programs and the cancer type-specificity of VHL mutations.NPJ Syst Biol Appl. 2024 Oct 3;10(1):114. doi: 10.1038/s41540-024-00445-2. NPJ Syst Biol Appl. 2024. PMID: 39362887 Free PMC article.
-
Identifying key multifunctional components shared by critical cancer and normal liver pathways via SparseGMM.Cell Rep Methods. 2023 Jan 16;3(1):100392. doi: 10.1016/j.crmeth.2022.100392. eCollection 2023 Jan 23. Cell Rep Methods. 2023. PMID: 36814838 Free PMC article.
-
Molecular characterization of type I IFN-induced cytotoxicity in bladder cancer cells reveals biomarkers of resistance.Mol Ther Oncolytics. 2021 Nov 12;23:547-559. doi: 10.1016/j.omto.2021.11.006. eCollection 2021 Dec 17. Mol Ther Oncolytics. 2021. PMID: 34938855 Free PMC article.
-
Simulating the restoration of normal gene expression from different thyroid cancer stages using deep learning.BMC Cancer. 2022 Jun 4;22(1):612. doi: 10.1186/s12885-022-09704-z. BMC Cancer. 2022. PMID: 35659616 Free PMC article.
-
Exploring the latent space of transcriptomic data with topic modeling.NAR Genom Bioinform. 2025 Apr 22;7(2):lqaf049. doi: 10.1093/nargab/lqaf049. eCollection 2025 Jun. NAR Genom Bioinform. 2025. PMID: 40264683 Free PMC article.
References
Data Citations
-
- Wang Q., Gao J., Nikolaus S. 2017. Figshare. https://doi.org/10.6084/m9.figshare.5330539 - DOI
-
- Wang Q., Gao J., Nikolaus S. 2017. Figshare. https://doi.org/10.6084/m9.figshare.5330575 - DOI
-
- Wang Q., Gao J., Nikolaus S. 2017. Figshare. https://doi.org/10.6084/m9.figshare.5330593 - DOI
References
Publication types
MeSH terms
Substances
Associated data
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources