Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Dec 1;2(4):205-215.
doi: 10.1089/big.2014.0068.

Data Integration for Heterogenous Datasets

Affiliations

Data Integration for Heterogenous Datasets

James Hendler. Big Data. .

Abstract

More and more, the needs of data analysts are requiring the use of data outside the control of their own organizations. The increasing amount of data available on the Web, the new technologies for linking data across datasets, and the increasing need to integrate structured and unstructured data are all driving this trend. In this article, we provide a technical overview of the emerging "broad data" area, in which the variety of heterogeneous data being used, rather than the scale of the data being analyzed, is the limiting factor in data analysis efforts. The article explores some of the emerging themes in data discovery, data integration, linked data, and the combination of structured and unstructured data.

PubMed Disclaimer

Figures

FIG. 1.
FIG. 1.
Users increasingly need to integrate data from multiple sources, and of multiple types, into their strategic data analyses.
FIG. 2.
FIG. 2.
A simple example integrating U.S. and Chinese data with an intermediary dataset.
FIG. 3.
FIG. 3.
The U.S. Data.gov and many other government web sites provide faceted browsers for searching through large dataset catalogs based on metadata features.
FIG. 4.
FIG. 4.
Comparing U.S. foreign aid. (a) Comparing U.S. foreign aid from various agencies. Data is reconciled by country, amounts, and years. (b) Extending the comparison to show U.S. foreign aid from various agencies and the specific types of aid over time. (c) Comparing U.S. foreign aid and British foreign aid by combining datasets and using the currency conversion data show in Figure 2.
FIG. 5.
FIG. 5.
Database linking example.
FIG. 6.
FIG. 6.
A tabular depiction of some of the different ways the state “Alaska” is referred to in different databases.
FIG. 7.
FIG. 7.
A comparison of burglary data between a U.K. and a U.S. dataset.
FIG. 8.
FIG. 8.
News titles, tweets with sentiment, and mapping use time and geolocation features to correlate and display information.

References

    1. Nickerson D, Rogers T. Political campaigns and big data. J Econ Perspect 2014; 28
    1. Fayyad U, Piatetsky-Shapiro G, Smyth P. The KDD process for extracting useful knowledge from volumes of data. Commun ACM 1996; 39:27–34
    1. Ferrucci D. (ed). Special issue: This is Watson. IBM J Res Dev 2012; 56
    1. Barbosa L, Pham K, Silva C, et al. . Structured open urban data: understanding the landscape. Big Data 2014; 2:144–154 - PMC - PubMed
    1. Hendler J. Peta vs. Meta. Big Data 2013; 1:82–84 - PubMed