Consistently processed RNA sequencing data from 50 sources enriched for pediatric data
- PMID: 40603900
- PMCID: PMC12222803
- DOI: 10.1038/s41597-025-05376-z
Consistently processed RNA sequencing data from 50 sources enriched for pediatric data
Abstract
Larger cohorts improve the power of tumor gene expression analysis, but the signal is muddied if datasets are processed using different methods or have inaccurate metadata. Here we present five compendia containing consistently processed gene expression data derived from 16,446 diverse RNA sequencing datasets. To create the compendia, we obtained access to RNA sequence data from repositories containing public data as well as clinical partners with access to non-published data. We then assessed the quality, quantified gene expression, harmonized clinical metadata, and released the expression values and metadata without access restrictions. These datasets have been used for diverse projects ranging from identifying similarities between tumor types to assessing how well cell lines recapitulate tumors. They have also been used for n-of-1 analysis to identify genes with unusual expression patterns in a single sample and to infer molecular diagnosis. The comparison to new data is enabled by our dockerized, freely available pipeline. The compendia have been cited in at least 20 publications.
© 2025. The Author(s).
Conflict of interest statement
Competing interests: The authors declare no competing interests.
Figures



Similar articles
-
Can a Liquid Biopsy Detect Circulating Tumor DNA With Low-passage Whole-genome Sequencing in Patients With a Sarcoma? A Pilot Evaluation.Clin Orthop Relat Res. 2025 Jan 1;483(1):39-48. doi: 10.1097/CORR.0000000000003161. Epub 2024 Jun 21. Clin Orthop Relat Res. 2025. PMID: 38905450
-
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3. Cochrane Database Syst Rev. 2022. PMID: 35593186 Free PMC article.
-
A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.Health Technol Assess. 2001;5(32):1-195. doi: 10.3310/hta5320. Health Technol Assess. 2001. PMID: 12065068
-
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23. Clin Orthop Relat Res. 2024. PMID: 39051924
-
Assessing the comparative effects of interventions in COPD: a tutorial on network meta-analysis for clinicians.Respir Res. 2024 Dec 21;25(1):438. doi: 10.1186/s12931-024-03056-x. Respir Res. 2024. PMID: 39709425 Free PMC article. Review.
References
-
- Tomida, S. et al. Gene expression-based, individualized outcome prediction for surgically treated lung cancer patients. Oncogene23, 5360–5370 (2004). - PubMed
-
- Roy, R., Winteringham, L. N., Lassmann, T. & Forrest, A. R. R. Expression Levels of Therapeutic Targets as Indicators of Sensitivity to Targeted Therapeutics. Mol. Cancer Ther.18, 2480–2489 (2019). - PubMed
Publication types
MeSH terms
Grants and funding
- R01 CA243555/CA/NCI NIH HHS/United States
- NextGen Award for Transformative Cancer Research/American Association for Cancer Research (American Association for Cancer Research, Inc.)
- Emily Beazley Kures for Kids/St. Baldrick's Foundation (St. Baldrick's Foundation, Inc)
- California Alliance for Minority Participation (CAMP)/National Science Foundation (NSF)
- 5RM1HG011543/U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
LinkOut - more resources
Full Text Sources
Medical