Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 May;11(5):1082-1099.
doi: 10.1158/2159-8290.CD-20-1230. Epub 2021 Jan 6.

St. Jude Cloud: A Pediatric Cancer Genomic Data-Sharing Ecosystem

Clay McLeod #  1 Alexander M Gout #  1 Xin Zhou  1 Andrew Thrasher  1 Delaram Rahbarinia  1 Samuel W Brady  1 Michael Macias  1 Kirby Birch  1 David Finkelstein  1 Jobin Sunny  1 Rahul Mudunuri  1 Brent A Orr  2 Madison Treadway  1 Bob Davidson  3 Tracy K Ard  3 Arthur Chiao  1 Andrew Swistak  1 Stephanie Wiggins  1 Scott Foy  1 Jian Wang  1 Edgar Sioson  1 Shuoguo Wang  1 J Robert Michael  1 Yu Liu  1 Xiaotu Ma  1 Aman Patel  1 Michael N Edmonson  1 Mark R Wilkinson  1 Andrew M Frantz  1 Ti-Cheng Chang  1 Liqing Tian  1 Shaohua Lei  1 S M Ashiqul Islam  4 Christopher Meyer  5 Naina Thangaraj  5 Pamella Tater  5 Vijay Kandali  5 Singer Ma  5 Tuan Nguyen  5 Omar Serang  5 Irina McGuire  6 Nedra Robison  6 Darrell Gentry  6 Xing Tang  7 Lance E Palmer  7 Gang Wu  1 Ed Suh  6 Leigh Tanner  6 James McMurry  6 Matthew Lear  2 Alberto S Pappo  8 Zhaoming Wang  1   9 Carmen L Wilson  9 Yong Cheng  7 Soheil Meshinchi  10 Ludmil B Alexandrov  4 Mitchell J Weiss  7 Gregory T Armstrong  9 Leslie L Robison  9 Yutaka Yasui  9 Kim E Nichols  8 David W Ellison  2 Chaitanya Bangur  3 Charles G Mullighan  2 Suzanne J Baker  11 Michael A Dyer  11 Geralyn Miller  3 Scott Newman  1 Michael Rusch  1 Richard Daly  5 Keith Perry  12 James R Downing  13 Jinghui Zhang  14
Affiliations

St. Jude Cloud: A Pediatric Cancer Genomic Data-Sharing Ecosystem

Clay McLeod et al. Cancer Discov. 2021 May.

Abstract

Effective data sharing is key to accelerating research to improve diagnostic precision, treatment efficacy, and long-term survival in pediatric cancer and other childhood catastrophic diseases. We present St. Jude Cloud (https://www.stjude.cloud), a cloud-based data-sharing ecosystem for accessing, analyzing, and visualizing genomic data from >10,000 pediatric patients with cancer and long-term survivors, and >800 pediatric sickle cell patients. Harmonized genomic data totaling 1.25 petabytes are freely available, including 12,104 whole genomes, 7,697 whole exomes, and 2,202 transcriptomes. The resource is expanding rapidly, with regular data uploads from St. Jude's prospective clinical genomics programs. Three interconnected apps within the ecosystem-Genomics Platform, Pediatric Cancer Knowledgebase, and Visualization Community-enable simultaneously performing advanced data analysis in the cloud and enhancing the Pediatric Cancer knowledgebase. We demonstrate the value of the ecosystem through use cases that classify 135 pediatric cancer subtypes by gene expression profiling and map mutational signatures across 35 pediatric cancer subtypes. SIGNIFICANCE: To advance research and treatment of pediatric cancer, we developed St. Jude Cloud, a data-sharing ecosystem for accessing >1.2 petabytes of raw genomic data from >10,000 pediatric patients and survivors, innovative analysis workflows, integrative multiomics visualizations, and a knowledgebase of published data contributed by the global pediatric cancer community.This article is highlighted in the In This Issue feature, p. 995.

PubMed Disclaimer

Conflict of interest statement

CONFLICTS OF INTEREST

C.G.M. has received research funding from Abbvie and Pfizer and served on an advisory board for Illumina.

Christopher Meyer, N.T., P.T., V.K, S.M., T.N., O.S., and R.D. are employees of DNAnexus.

B.D., T.K.A., C.B., and G.M. are employees of Microsoft.

Figures

Figure 1.
Figure 1.. Overview of St. Jude Cloud.
(A) Comparison of data sharing via the established centralized data repository model versus St. Jude Cloud. The established model requires replication of data and local computing infrastructure while cloud-based data sharing enables a user to perform custom analysis by uploading tools/analysis code onto the shared cloud-computing infrastructure without replication. (B) Overview of ingress, harmonization, and deposition of high-throughput sequencing datasets into the St. Jude Cloud ecosystem. Raw genomic data, collected from both retrospective research and prospective clinical studies, were harmonized and curated for access by the broad research community via the three apps on the St. Jude Cloud: Genomics Platform, PeCan Knowledgebase and Visualization Community.
Figure 2.
Figure 2.. Pediatric cancer genomics data on St. Jude Cloud.
(A) Summary of high-throughput sequencing data sets on St. Jude Cloud. (B) Frequency of pediatric cancer types in WGS data generated from paired tumor-normal samples (left) or germline-only pediatric cancer survivors (right). (C) Genomic data contributed by RTCG deposition. Cumulative plot of WGS, WES and RNA-Seq released beginning May 2019 through July 2020 is shown at left while rare pediatric blood (n=13, 5 subtypes), solid (n=31, 16 subtypes) and germ cell (n=7, 6 subtypes) tumor samples uniquely represented in clinical genomics samples are shown at right.
Figure 3.
Figure 3.. Working across the St. Jude Cloud ecosystem.
A virtual cohort can be assembled by querying the data browser on the Genomics Platform (top left) or exploring the Pediatric Cancer Knowledgebase (PeCan) portal (top right). Following approval by the data access committee, the requested data is “vended” onto a private cloud workspace in Genomics Platform (middle center) for analysis using the workflows on St. Jude Cloud (‘Genomics Platform Analysis Tools’), tools available within the DNAnexus Tool Ecosystem, or custom workflows. Alternatively, a user may download the vended genomic data to their local computing infrastructure for further in-depth analysis. Following each of these analyses, a user may share custom visualizations (e.g. landscape maps or cancer subgroup analyses) with the research community via the Visualization Community (bottom right) and published results can be incorporated to the PeCan Knowledgebase.
Figure 4.
Figure 4.. Classification of pediatric cancers by RNA-Seq expression profiling.
RNA-Seq t-SNE plot of 816 blood cancers (A), 302 solid tumors (B), and 447 brain tumors (C). The circle in B represents 4 metastatic osteosarcoma samples. Analysis of a user-supplied AML RNA-Seq BAM file on the St. Jude Cloud by importing data to Genomics Platform (D), performing fusion detection using Rapid RNA-Seq workflow which identified a ZBTB7A-NUTM1 fusion (E) and performing “RNA-Seq Expression Classification” analysis (F) which shown it groups with other AML samples and is distinct from other blood cancers (B-ALL) that also harbor NUTM1 fusions (labeled). In (F), the reference t-SNE map was constructed using all RNA-Seq data and the boundaries of brain, solid, B-cell acute lymphoblastic leukemia (B-ALL), T-cell acute lymphoblastic leukemia (T-ALL), and acute myeloid leukemia (AML) are marked by dotted lines. Abbreviations: B-ALL subtypes include ETV6-RUNX1 (ETV6), KMT2A-rearranged (KMT2A), DUX4-rearranged (DUX4), ZNF384-rearranged (ZNF384), MEF2D-rearranged (MEF2D), BCR-ABL1 (Ph), BCR-ABL1-like (Ph-like), Hyperdiploid, Hypodiploid, intrachromosomal amplification of chromosome 21 (iAMP21), NUTM1-rearranged (NUTM1), PAX5 p.Pro80Arg mutation (PAX5 P80R), and PAX5 alterations (PAX5 alt)); acute leukemia of ambiguous lineage (ALAL); T-cell acute lymphoblastic leukemia (T-ALL); acute myeloid leukemia (AML); acute megakaryoblastic leukemia (AMKL), acute promyelocytic leukemia (APML); anaplastic large cell lymphoma (ALCL); hepatocellular carcinoma and hepatoblastoma (Liver); thyroid papillary tumor (thyroid); embryonal/alveolar/botryoid rhabdomyosarcoma (RMS); desmoplastic small round cell tumor (DSRCT); Medulloblastoma (SHH, WNT, Group 3/4 (G3/4) subtypes); choroid plexus carcinoma (CPC); atypical teratoid/rhabdoid tumor (ATRT); and high-grade neuroepithelial tumor (HGNET). For a complete list of subtypes included in this analysis, please see Supplementary Table S3.
Figure 5.
Figure 5.. Analysis of mutational signature on St. Jude Cloud.
(A) Somatic mutation rate (left) and COSMIC mutational signatures (right) in pediatric cancer subtypes analyzed by WGS. The number of samples examined is indicated in parentheses. Mutation rate is shown at a log-scale, with the median indicated by a red line and samples within two standard deviations (SD), between two and three SD, and greater than three SD within the subtype marked by black, orange and red dots respectively. Note the outlier osteosarcoma samples with low mutation burden (marked orange and red) have <20% and <10% tumor purity respectively. The orange and red outlier High Grade Glioma samples are hypermutators with bi-allelic loss of either MSH2 or POLE, respectively. Heatmap of COSMIC mutational signatures with therapy-related signatures indicated with an asterisk (*). The scale represents the proportion of somatic mutations contributing to each signature in each sample averaged by subtype. (B) Analysis of mutational signature of adult AML samples on St. Jude Cloud. The results are compared to those of the pediatric AMLs in the summary tab while the mutational signatures of each adult AML sample are shown below. Cancer subtype abbreviations follow the same style as Fig. 4. For a complete list of subtypes included in this analysis, please see Supplementary Table S3.

References

    1. Cunningham RM, Walton MA, Carter PM. The Major Causes of Death in Children and Adolescents in the United States. N Engl J Med 2018;379(25):2468–75 doi 10.1056/NEJMsr1804754. - DOI - PMC - PubMed
    1. Downing JR, Wilson RK, Zhang J, Mardis ER, Pui CH, Ding L, et al. The Pediatric Cancer Genome Project. Nat Genet 2012;44(6):619–22 doi 10.1038/ng.2287. - DOI - PMC - PubMed
    1. Zhang J, Benavente CA, McEvoy J, Flores-Otero J, Ding L, Chen X, et al. A novel retinoblastoma therapy from genomic and epigenetic analyses. Nature 2012;481(7381):329–34 doi 10.1038/nature10733. - DOI - PMC - PubMed
    1. Wu G, Broniscer A, McEachron TA, Lu C, Paugh BS, Becksfort J, et al. Somatic histone H3 alterations in pediatric diffuse intrinsic pontine gliomas and non-brainstem glioblastomas. Nat Genet 2012;44(3):251–3 doi 10.1038/ng.1102. - DOI - PMC - PubMed
    1. Zhang J, Walsh MF, Wu G, Edmonson MN, Gruber TA, Easton J, et al. Germline Mutations in Predisposition Genes in Pediatric Cancer. N Engl J Med 2015;373(24):2336–46 doi 10.1056/NEJMoa1508054. - DOI - PMC - PubMed

Publication types