Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Oct;26(4):303-310.
doi: 10.4258/hir.2020.26.4.303. Epub 2020 Oct 31.

Building a Lung and Ovarian Cancer Data Warehouse

Affiliations

Building a Lung and Ovarian Cancer Data Warehouse

Canan Eren Atay et al. Healthc Inform Res. 2020 Oct.

Abstract

Objectives: Despite the collection of vast amounts of data by the healthcare sector, effective decision-making in medical practice is still challenging. Data warehousing technology can be applied for the collection and management of clinical data from various sources to provide meaningful insights for physicians and administrators. Cancer data are extremely complicated and massive; hence, a clinical data warehouse system can provide insights into prevention, diagnosis and treatment processes through the use of online analytical processing tools for the analysis of multi-dimensional data at different granularity levels.

Methods: In this study, a clinical data warehouse was developed for lung cancer data, which were kindly provided by the United States National Cancer Institute. Lung and ovarian cancer data were imported in specific formats and cleaned to remove errors and redundancies. SQL server integration services (SSIS) were used for the extract-transform-load (ETL) process.

Results: The design of the clinical data warehouse responds efficiently to all types of queries by adopting the fact constellation schema model. Various online analytical processing queries can be expressed using the proposed approach.

Conclusions: This model succeeded in responding to complex queries, and the analysis of data is facilitated by using online analytical processing cubes and viewing multilevel data details.

Keywords: Data Analytics; Data Warehousing; Lung Cancer; Ovarian Cancer.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest

No potential conflict of interest relevant to this article was reported.

Figures

Query 1
Query 1
How is the distribution of patients with nodules and biopsies according to the number of daily cigarettes?
Query 2
Query 2
What is the distribution of patients with complication Pnömotoraks, collection of air in the pleural cavity, and those treated with chemotherapy according to age?
Query 3
Query 3
Compare the number of complications in ovarian and lung cancer patients.
Query 4
Query 4
List the PLCO_ID numbers and names of patients who received “non-curative” treatment for both lung and ovarian cancer.
Figure 1
Figure 1
Proposed project architecture. Adapted from Sheta and Eldeen [16].
Figure 2
Figure 2
Star schema for medical records.
Figure 3
Figure 3
Snowflake schema for medical records.
Figure 4
Figure 4
Lung and ovarian cancer clinical data warehouse fact constellation schema model.

References

    1. Garani G, Atay CE. Encountering incomplete temporal information in clinical data warehouses. Int J Appl Res Public Health Manag. 2020;5(1):32–48.
    1. Kallmeyer V, Venkat K. Beyond e-health: health and information technology converge. Siliconindia. 2002;6(4):42.
    1. The Global Cancer Observatory [Internet] Lyon, France: International Agency for Research on Cancer; c2020. [cited at 2020 Sep 10]. Available from: https://gco.iarc.fr/
    1. Ferlay J, Parkin DM, Steliarova-Foucher E. Estimates of cancer incidence and mortality in Europe in 2008. Eur J Cancer. 2010;46(4):765–81. - PubMed
    1. Miele S, Shockley R. Analytics: the real-world use of big data. Somers (NY): IBM Global Business Services; 2013.