Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Sep 1;2(3):155-163.
doi: 10.1089/big.2014.0026.

A Big Data Guide to Understanding Climate Change: The Case for Theory-Guided Data Science

Affiliations

A Big Data Guide to Understanding Climate Change: The Case for Theory-Guided Data Science

James H Faghmous et al. Big Data. .

Abstract

Global climate change and its impact on human life has become one of our era's greatest challenges. Despite the urgency, data science has had little impact on furthering our understanding of our planet in spite of the abundance of climate data. This is a stark contrast from other fields such as advertising or electronic commerce where big data has been a great success story. This discrepancy stems from the complex nature of climate data as well as the scientific questions climate science brings forth. This article introduces a data science audience to the challenges and opportunities to mine large climate datasets, with an emphasis on the nuanced difference between mining climate data and traditional big data approaches. We focus on data, methods, and application challenges that must be addressed in order for big data to fulfill their promise with regard to climate science applications. More importantly, we highlight research showing that solely relying on traditional big data techniques results in dubious findings, and we instead propose a theory-guided data science paradigm that uses scientific theory to constrain both the big data techniques as well as the results-interpretation process to extract accurate insight from large climate data.

PubMed Disclaimer

Figures

FIG. 1.
FIG. 1.
Schematic view of the components of the global climate system (bold), their processes and interactions (thin arrows), and some aspects that might change due to global warming (bold arrows). Some of these components have reliable datasets, while others don't. This figure does not show the various temporal time scales at which these processes interact. The multiple spatiotemporal scales at which components of the climate system interact make a data-driven study of climate extremely challenging. Figure from Intergovernmental Panel on Climate Change (IPCC).
FIG. 2.
FIG. 2.
An example of raw and postprocessed satellite data. (Left) Along-track satellite observations of sea surface height from the JASON-II satellite for May 20, 2010. (Middle) A 12-day composite of five satellites centered on May 20, 2010. (Right) The postprocessed data from May 20, 2010. The altimeter products were produced by Ssalto/Duacs and distributed by AVISO, with support from CNES (www.aviso.oceanobs.com/duacs/).

References

    1. Overpeck JT, Meehl GA, Bony S, Easterling DR. Climate data challenges in the 21st century. Science 2011; 331:700. - PubMed
    1. Langley P. The changing science of machine learning. Mach Learn 2011; 82:275–279
    1. Steinbach M, et al. . Clustering earth science data: Goals, issues and results. In: Proceedings of the Fourth KDD Workshop on Mining Scientific Datasets, 2001
    1. Tan P, et al. . Finding spatio-temporal patterns in earth science data. In: KDD 2001 Workshop on Temporal Data Mining, 2001
    1. Steinbach M., et al. . Clustering earth science data: Goals, issues, and results. In: Getoor L, Senator TE, Domingos P, Faloutsos C. (Eds.): Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August24–27, 2003 ACM 2003. ISBN 1-58113-737-0 pp. 446–455

LinkOut - more resources