Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2014 Sep 1;2(3):144-154.
doi: 10.1089/big.2014.0020.

Structured Open Urban Data: Understanding the Landscape

Affiliations
Review

Structured Open Urban Data: Understanding the Landscape

Luciano Barbosa et al. Big Data. .

Abstract

A growing number of cities are now making urban data freely available to the public. Besides promoting transparency, these data can have a transformative effect in social science research as well as in how citizens participate in governance. These initiatives, however, are fairly recent and the landscape of open urban data is not well known. In this study, we try to shed some light on this through a detailed study of over 9,000 open data sets from 20 cities in North America. We start by presenting general statistics about the content, size, nature, and popularity of the different data sets, and then examine in more detail structured data sets that contain tabular data. Since a key benefit of having a large number of data sets available is the ability to fuse information, we investigate opportunities for data integration. We also study data quality issues and time-related aspects, namely, recency and change frequency. Our findings are encouraging in that most of the data are structured and published in standard formats that are easy to parse; there is ample opportunity to integrate different data sets; and the volume of data is increasing steadily. But they also uncovered a number of challenges that need to be addressed to enable these data to be fully leveraged. We discuss both our findings and issues involved in using open urban data.

PubMed Disclaimer

Figures

FIG. 1.
FIG. 1.
Population size versus number of data sets.
FIG. 2.
FIG. 2.
Proportion of format types.
FIG. 3.
FIG. 3.
Proportion of data in tabular format.
FIG. 4.
FIG. 4.
Tag clouds derived from the keywords associated with the data sets.
FIG. 5.
FIG. 5.
Distribution of data set views and downloads.
FIG. 6.
FIG. 6.
Tag clouds derived from the keywords associated with the most popular data sets, that is, data sets with the largest number of downloads.
FIG. 7.
FIG. 7.
Distribution of tables' age in months. The inclined horizontal line is the trend line for this distribution.
FIG. 8.
FIG. 8.
Change frequency ratio of tables over 30 days.
FIG. 9.
FIG. 9.
Distribution of schema sizes.
FIG. 10.
FIG. 10.
Schema diversity for tables in five cities.
FIG. 11.
FIG. 11.
Similarity among data sets taking into account their schemata and overlap of attribute names.
FIG. 12.
FIG. 12.
Proportion of different types in data sets for 10 cities.
FIG. 13.
FIG. 13.
Distribution of table sparseness: proportion of null values in tables.
FIG. 14.
FIG. 14.
Degree of informativeness: proportion of columns whose names contain words in the English dictionary.
FIG. 15.
FIG. 15.
Heatmaps of the geographical coverage of the data sets for (a) Chicago and (b) NYC.
FIG. 16.
FIG. 16.
Population of zip code regions versus references to the zip codes in Chicago data sets.

References

    1. The World Bank. Urban Development. Available online at http://data.worldbank.org/topic/urban-development, 2014. (Last accessed on Feb. 2, 2014)
    1. Goldstein B, Dyson L. Beyond Transparency: Open Data and the Future of Civic Innovation. San Francisco: Code for America Press, 2013
    1. Höchtl J, Reichstädter P. Linked open data—a means for public sector information management. In: Electronic Government and the Information Systems Perspective, Volume 6866 of Lecture Notes in Computer Science. Berlin: Springer, 2011, pp. 330–343
    1. Shadbolt N, O'Hara K, Berners-Lee T, et al. . Linked open government data: Lessons from data.gov.uk IEEE Intell Syst 2012; 27:16–24
    1. NYC Open Data. Available online at http://data.ny.gov (Last accessed on September3, 2014)

LinkOut - more resources