Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Sep 21:5:83.
doi: 10.3389/fcell.2017.00083. eCollection 2017.

A Comprehensive Infrastructure for Big Data in Cancer Research: Accelerating Cancer Research and Precision Medicine

Affiliations

A Comprehensive Infrastructure for Big Data in Cancer Research: Accelerating Cancer Research and Precision Medicine

Izumi V Hinkson et al. Front Cell Dev Biol. .

Erratum in

Abstract

Advancements in next-generation sequencing and other -omics technologies are accelerating the detailed molecular characterization of individual patient tumors, and driving the evolution of precision medicine. Cancer is no longer considered a single disease, but rather, a diverse array of diseases wherein each patient has a unique collection of germline variants and somatic mutations. Molecular profiling of patient-derived samples has led to a data explosion that could help us understand the contributions of environment and germline to risk, therapeutic response, and outcome. To maximize the value of these data, an interdisciplinary approach is paramount. The National Cancer Institute (NCI) has initiated multiple projects to characterize tumor samples using multi-omic approaches. These projects harness the expertise of clinicians, biologists, computer scientists, and software engineers to investigate cancer biology and therapeutic response in multidisciplinary teams. Petabytes of cancer genomic, transcriptomic, epigenomic, proteomic, and imaging data have been generated by these projects. To address the data analysis challenges associated with these large datasets, the NCI has sponsored the development of the Genomic Data Commons (GDC) and three Cloud Resources. The GDC ensures data and metadata quality, ingests and harmonizes genomic data, and securely redistributes the data. During its pilot phase, the Cloud Resources tested multiple cloud-based approaches for enhancing data access, collaboration, computational scalability, resource democratization, and reproducibility. These NCI-led efforts are continuously being refined to better support open data practices and precision oncology, and to serve as building blocks of the NCI Cancer Research Data Commons.

Keywords: big data; cancer; cloud infrastructure; genomics; imaging; precision medicine; proteomics.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The NCI Cancer Research Data Commons: An Expandable Infrastructure. The NCI Cancer Research Data Commons will be a cloud-based network in which each node is focused on a specific data type. Nodes will include the Genomic Data Commons, Proteomic Data Commons, and Imaging Data Commons. Future plans include the addition of nodes that support other research modalities such as clinical data, epidemiological data, and cancer models. Through a secure authentication and authorization process, biomedical researchers, tool developers, computer scientists, informaticians, clinicians, and patients will be able to bring their own data and tools to nodes, as well as access harmonized data and hosted tools via APIs and a web interface. Users will also be able to harness elastic compute capabilities for computational analyses, visualization of results, and data queries in the cloud (NCI, 2017).

References

    1. Andersen R. (2012 April 19). How Big Data is Changing Astronomy (Again). The Atlantic.
    1. BRP (2016). Cancer Moonshot Blue Ribbon Panel Report 2016. Available online at: https://www.cancer.gov/research/key-initiatives/moonshot-cancer-initiative
    1. Cancer Genome Atlas Research Network (2008). Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455, 1061–1068. 10.1038/nature07385 - DOI - PMC - PubMed
    1. Cancer Genome Atlas Research Network (2011). Integrated genomic analyses of ovarian carcinoma. Nature 474, 609–615. 10.1038/nature10166 - DOI - PMC - PubMed
    1. Cancer Genome Atlas Research Network (2012). Comprehensive genomic characterization of squamous cell lung cancers. Nature 489, 519–525. 10.1038/nature11404 - DOI - PMC - PubMed

LinkOut - more resources