Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 May 2;84(9):1404-1409.
doi: 10.1158/0008-5472.CAN-23-2730.

NCI Cancer Research Data Commons: Lessons Learned and Future State

Affiliations
Review

NCI Cancer Research Data Commons: Lessons Learned and Future State

Erika Kim et al. Cancer Res. .

Abstract

More than ever, scientific progress in cancer research hinges on our ability to combine datasets and extract meaningful interpretations to better understand diseases and ultimately inform the development of better treatments and diagnostic tools. To enable the successful sharing and use of big data, the NCI developed the Cancer Research Data Commons (CRDC), providing access to a large, comprehensive, and expanding collection of cancer data. The CRDC is a cloud-based data science infrastructure that eliminates the need for researchers to download and store large-scale datasets by allowing them to perform analysis where data reside. Over the past 10 years, the CRDC has made significant progress in providing access to data and tools along with training and outreach to support the cancer research community. In this review, we provide an overview of the history and the impact of the CRDC to date, lessons learned, and future plans to further promote data sharing, accessibility, interoperability, and reuse. See related articles by Brady et al., p. 1384, Wang et al., p. 1388, and Pot et al., p. 1396.

PubMed Disclaimer

Figures

Figure 1. The NCI's CRDC, an expandable infrastructure. The CRDC is a cloud-based network of data type–specific data commons: GDC, PDC, IDC, and ICDC, as well as a data type agnostic commons, the CDS. Through a secure authentication and authorization process, users can access cancer data and hosted tools via APIs and web interfaces. Users can also harness the elastic compute capabilities of the cloud for computational analyses, visualization of results, and data queries in addition to bringing their own data and tools to their workspaces in the NCI Cloud Resources.
Figure 1.
The NCI's CRDC, an expandable infrastructure. The CRDC is a cloud-based network of data type–specific data commons: GDC, PDC, IDC, and ICDC, as well as a data type agnostic commons, the CDS. Through a secure authentication and authorization process, users can access cancer data and hosted tools via APIs and web interfaces. Users can also harness the elastic compute capabilities of the cloud for computational analyses, visualization of results, and data queries in addition to bringing their own data and tools to their workspaces in the NCI Cloud Resources.
Figure 2. NCI CRDC statistics and impact. The full impact on cancer research the CRDC has had since its launch in 2014. The CRDC provides access to nearly 10 petabytes of cancer data from over 350 studies and 134K subjects. It also provides more than 2K on-demand computational analysis tools and workflows in secure, collaborative cloud workspaces and over 82K users have performed 2.4K years of compute, resulting in 30K data citations.
Figure 2.
NCI CRDC statistics and impact. The full impact on cancer research the CRDC has had since its launch in 2014. The CRDC provides access to nearly 10 petabytes of cancer data from over 350 studies and 134K subjects. It also provides more than 2K on-demand computational analysis tools and workflows in secure, collaborative cloud workspaces and over 82K users have performed 2.4K years of compute, resulting in 30K data citations.

References

    1. Hutter C, Zenklusen JC. The Cancer Genome Atlas: creating lasting value beyond its data. Cell 2018;173:283–5. - PubMed
    1. Edwards NJ, Oberti M, Thangudu RR, Cai S, McGarvey PB, Jacob S, et al. . The CPTAC data portal: a resource for cancer proteomics research. J Proteome Res 2015;14:2707–13. - PubMed
    1. Flores-Toro JA, Jagu S, Armstrong GT, Arons DF, Aune GJ, Chanock SJ, et al. . The childhood cancer data initiative: using the power of data to learn from and improve outcomes for every child and young adult with pediatric cancer. J Clin Oncol 2023;41:4045–53. - PMC - PubMed
    1. Rozenblatt-Rosen O, Regev A, Oberdoerffer P, Nawy T, Hupalowska A, Rood JE, et al. . The Human Tumor Atlas Network: charting tumor transitions across space and time at single-cell resolution. Cell 2020;181:236–49. - PMC - PubMed
    1. Wilkinson M, Dumontier M, Aalbersberg I, Appleton G, Axton M, Baak A, et al. . The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 2016;160018. 10.1038/sdata.2016.18. - DOI - PMC - PubMed

Publication types