The Open Pediatric Cancer Project
- PMID: 40891528
- PMCID: PMC12402770
- DOI: 10.1093/gigascience/giaf093
The Open Pediatric Cancer Project
Abstract
Background: In 2019, the Open Pediatric Brain Tumor Atlas (OpenPBTA) was created as a global, collaborative open-science initiative to genomically characterize 1,074 pediatric brain tumors and 22 patient-derived cell lines. Here, we present an extension of the OpenPBTA called the Open Pediatric Cancer (OpenPedCan) Project, a harmonized open-source multiomic dataset from 6,112 pediatric cancer patients with 7,096 tumor events across more than 100 histologies. Combined with RNA sequencing (RNA-seq) from the Genotype-Tissue Expression and The Cancer Genome Atlas projects, OpenPedCan contains nearly 48,000 total biospecimens (24,002 tumor and 23,893 normal specimens).
Findings: We utilized Gabriella Miller Kids First workflows to harmonize whole-genome sequencing (WGS), whole exome sequencing (WXS), RNA-seq, and Targeted Sequencing datasets to include somatic SNVs, indels, copy number variants, structural variants, RNA expression, fusions, and splice variants. We integrated summarized Clinical Proteomic Tumor Analysis Consortium whole-cell proteomics and phospho-proteomics data and miRNA sequencing data, as well as developed a methylation array harmonization workflow to include m-values, beta-values, and copy number calls. OpenPedCan contains reproducible, dockerized workflows in GitHub, CAVATICA, and Amazon Web Services (AWS) to deliver harmonized and processed data from over 60 scalable modules, which can be leveraged both locally and on AWS. The processed data are released in a versioned manner and accessible through CAVATICA or AWS S3 download (from GitHub) and queryable through PedcBioPortal and the National Cancer Institute's pediatric Molecular Targets Platform. Notably, we have expanded Pediatric Brain Tumor Atlas molecular subtyping to include methylation information to align with the World Health Organization 2021 Central Nervous System Tumor classifications, allowing us to create research-grade integrated diagnoses for these tumors.
Conclusions: OpenPedCan data and its reproducible analysis module framework are openly available and can be utilized and/or adapted by researchers to accelerate discovery, validation, and clinical translation.
Keywords: Docker; OpenPedCan; multiomics; open science; pediatric cancer; reproducibility.
© The Author(s) 2025. Published by Oxford University Press on behalf of GigaScience.
Conflict of interest statement
The authors declare no competing interests.
Figures



Update of
-
The Open Pediatric Cancer Project.bioRxiv [Preprint]. 2025 Jun 28:2024.07.09.599086. doi: 10.1101/2024.07.09.599086. bioRxiv. 2025. Update in: Gigascience. 2025 Jan 6;14:giaf093. doi: 10.1093/gigascience/giaf093. PMID: 39026781 Free PMC article. Updated. Preprint.
References
-
- Molecular Targets Platform . https://moleculartargets.ccdi.cancer.gov/. Accessed 14 July 2025.
-
- Children's Brain Tumor Network . https://cbtn.org/. Accessed 14 July 2025.
-
- Gabriella Kids First Pediatric Research Program Data Resource Center . https://kidsfirstdrc.org/. Accessed 14 July 2025.
-
- Pediatric Neuro-Oncology Consortium . https://pnoc.us/. Accessed 14 July 2025.