Consistently processed RNA sequencing data from 50 sources enriched for pediatric data
- PMID: 40603900
- PMCID: PMC12222803
- DOI: 10.1038/s41597-025-05376-z
Consistently processed RNA sequencing data from 50 sources enriched for pediatric data
Abstract
Larger cohorts improve the power of tumor gene expression analysis, but the signal is muddied if datasets are processed using different methods or have inaccurate metadata. Here we present five compendia containing consistently processed gene expression data derived from 16,446 diverse RNA sequencing datasets. To create the compendia, we obtained access to RNA sequence data from repositories containing public data as well as clinical partners with access to non-published data. We then assessed the quality, quantified gene expression, harmonized clinical metadata, and released the expression values and metadata without access restrictions. These datasets have been used for diverse projects ranging from identifying similarities between tumor types to assessing how well cell lines recapitulate tumors. They have also been used for n-of-1 analysis to identify genes with unusual expression patterns in a single sample and to infer molecular diagnosis. The comparison to new data is enabled by our dockerized, freely available pipeline. The compendia have been cited in at least 20 publications.
© 2025. The Author(s).
Conflict of interest statement
Competing interests: The authors declare no competing interests.
Figures
References
Publication types
MeSH terms
Grants and funding
- R01 CA243555/CA/NCI NIH HHS/United States
- NextGen Award for Transformative Cancer Research/American Association for Cancer Research (American Association for Cancer Research, Inc.)
- Emily Beazley Kures for Kids/St. Baldrick's Foundation (St. Baldrick's Foundation, Inc)
- California Alliance for Minority Participation (CAMP)/National Science Foundation (NSF)
- 5RM1HG011543/U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
LinkOut - more resources
Full Text Sources
Medical
