Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Mar 18;17(3):e3000183.
doi: 10.1371/journal.pbio.3000183. eCollection 2019 Mar.

Biodiversity data integration-the significance of data resolution and domain

Affiliations

Biodiversity data integration-the significance of data resolution and domain

Christian König et al. PLoS Biol. .

Abstract

Recent years have seen an explosion in the availability of biodiversity data describing the distribution, function, and evolutionary history of life on earth. Integrating these heterogeneous data remains a challenge due to large variations in observational scales, collection purposes, and terminologies. Here, we conceptualize widely used biodiversity data types according to their domain (what aspect of biodiversity is described?) and informational resolution (how specific is the description?). Applying this framework to major data providers in biodiversity research reveals a strong focus on the disaggregated end of the data spectrum, whereas aggregated data types remain largely underutilized. We discuss the implications of this imbalance for the scope and representativeness of current macroecological research and highlight the synergies arising from a tighter integration of biodiversity data across domains and resolutions. We lay out effective strategies for data collection, mobilization, imputation, and sharing and summarize existing frameworks for scalable and integrative biodiversity research. Finally, we use two case studies to demonstrate how the explicit consideration of data domain and resolution helps to identify biases and gaps in global data sets and achieve unprecedented taxonomic and geographical data coverage in macroecological analyses.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Selected biodiversity data types, arranged according to their primary domain (here, species distributions versus functional traits) and informational resolution (disaggregated versus aggregated).
Projects that integrate global plant diversity data are often domain-specific (e.g., Map of Life [24]; TRY [7]) or focus on the disaggregated end of the data spectrum (e.g., GBIF [6], BIEN [26]). Complementing the ecological data landscape with aggregated data (e.g., GIFT [28]) creates strong synergies and facilitates biodiversity data integration across domains and resolutions. BIEN, Botanical Information Network and Ecology Network; GBIF, Global Biodiversity Information Facility; GIFT, Global Inventory of Floras and Traits.
Fig 2
Fig 2. Comparison of logical and statistical data imputation.
Logical imputation infers a limited quantity of highly certain data (e.g., deducing woodiness status from growth form), whereas statistical imputation yields large quantities of less certain data (e.g., predicting a suite of functional traits or species occurrences from sparse records).
Fig 3
Fig 3
The global composition in plant growth form as observed for 818 angiosperm floras (left) and modeled for 6,495 equal-area grid cells (right). Upper plots summarize the growth form spectra across all observed (A) and modeled (B) geographical units, with each line representing a single flora. Lower plots (C–H) show the observed and modeled geographic variation in the proportion of herbs, shrubs, and trees individually. Note that the range of values varies across growth forms. The underlying data and data references for this figure can be found in S1 Data.
Fig 4
Fig 4. Latitudinal gradient in seed mass for 519,812 species-region combinations.
Piecewise regression (dashed black line) was compared against linear models for the entire data set (solid black line) and individual growth forms (colored lines). Upper plot shows the relative proportion of growth forms in each 1-degree latitudinal band. Right-hand plot depicts the frequency distribution of seed mass for individual growth forms. The underlying data and data references for this figure can be found in S2 Data.

References

    1. Watson JEM, Jones KR, Fuller RA, Di Marco M, Segan DB, Butchart SHM, et al. Persistent Disparities between Recent Rates of Habitat Conversion and Protection and Implications for Future Global Conservation Targets. Conservation Letters. 2016; 9: 413–421. 10.1111/conl.12295 - DOI
    1. Pachauri RK, Allen MR, Barros VR, Broome J, Cramer W, Christ R, et al. Climate change 2014. Synthesis report. Contribution of Working Groups I, II and III to the fifth assessment report of the Intergovernmental Panel on Climate Change: IPCC; 2014.
    1. Seebens H, Blackburn TM, Dyer EE, Genovesi P, Hulme PE, Jeschke JM, et al. No saturation in the accumulation of alien species worldwide. Nature Communications. 2017; 8: 14435 10.1038/ncomms14435 - DOI - PMC - PubMed
    1. Kelling S, Hochachka WM, Fink D, Riedewald M, Caruana R, Ballard G, et al. Data-intensive Science. A New Paradigm for Biodiversity Studies. BioScience. 2009; 59: 613–620. 10.1525/bio.2009.59.7.12 - DOI
    1. Hampton SE, Strasser CA, Tewksbury JJ, Gram WK, Budden AE, Batcheller AL, et al. Big data and the future of ecology. Frontiers in Ecology and the Environment. 2013; 11: 156–162. 10.1890/120103 - DOI