Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jun 12;6(14):4654-69.
doi: 10.1002/ece3.2225. eCollection 2016 Jul.

Filling in the GAPS: evaluating completeness and coverage of open-access biodiversity databases in the United States

Affiliations

Filling in the GAPS: evaluating completeness and coverage of open-access biodiversity databases in the United States

Matthew J Troia et al. Ecol Evol. .

Abstract

Primary biodiversity data constitute observations of particular species at given points in time and space. Open-access electronic databases provide unprecedented access to these data, but their usefulness in characterizing species distributions and patterns in biodiversity depend on how complete species inventories are at a given survey location and how uniformly distributed survey locations are along dimensions of time, space, and environment. Our aim was to compare completeness and coverage among three open-access databases representing ten taxonomic groups (amphibians, birds, freshwater bivalves, crayfish, freshwater fish, fungi, insects, mammals, plants, and reptiles) in the contiguous United States. We compiled occurrence records from the Global Biodiversity Information Facility (GBIF), the North American Breeding Bird Survey (BBS), and federally administered fish surveys (FFS). We aggregated occurrence records by 0.1° × 0.1° grid cells and computed three completeness metrics to classify each grid cell as well-surveyed or not. Next, we compared frequency distributions of surveyed grid cells to background environmental conditions in a GIS and performed Kolmogorov-Smirnov tests to quantify coverage through time, along two spatial gradients, and along eight environmental gradients. The three databases contributed >13.6 million reliable occurrence records distributed among >190,000 grid cells. The percent of well-surveyed grid cells was substantially lower for GBIF (5.2%) than for systematic surveys (BBS and FFS; 82.5%). Still, the large number of GBIF occurrence records produced at least 250 well-surveyed grid cells for six of nine taxonomic groups. Coverages of systematic surveys were less biased across spatial and environmental dimensions but were more biased in temporal coverage compared to GBIF data. GBIF coverages also varied among taxonomic groups, consistent with commonly recognized geographic, environmental, and institutional sampling biases. This comprehensive assessment of biodiversity data across the contiguous United States provides a prioritization scheme to fill in the gaps by contributing existing occurrence records to the public domain and planning future surveys.

Keywords: Biodiversity; Global Biodiversity Information Facility; National Rivers and Streams Assessment; National Water Quality Assessment; North American Breeding Bird Survey; Regional Environmental Monitoring and Assessment Program; Wallacean shortfall; museum collections; species distribution modeling.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Distribution of all surveyed grid cells and well‐surveyed grid cells throughout the contiguous United States during the contemporary time period (1990–2013) derived from three open‐access biodiversity databases representing ten taxonomic groups. Note that the square symbols are enlarged (i.e., larger than actual grid cell area) to facilitate visualization of well‐surveyed regions.
Figure 2
Figure 2
Percent of all grid cells in the contiguous United States (= 83,545) that contain surveys derived from three open‐access biodiversity databases representing the complete time period (1800–2013) and contemporary time period (1990–2013).
Figure 3
Figure 3
Frequency of all surveyed grid cells and well‐surveyed grid cells in each of eleven 20‐year intervals between 1800 and 2013 (most recent interval is 14 years; 2000–2013) for three open‐access biodiversity databases representing ten taxonomic groups. Note different y‐axis scales within and among panels.
Figure 4
Figure 4
Relationship between (A) number of species per grid cell and (B) cumulative coverage of well‐surveyed grid cells versus all surveyed grid cells derived from three open‐access biodiversity databases representing ten taxonomic groups. In (B), low values represent unbiased coverage and high values represent biased coverage relative to the background environment. Note that richness and cumulative coverage could not be plotted along the y‐axis for crayfish because no well‐surveyed grid cells were identified.
Figure 5
Figure 5
Coverage indices for each of twelve taxonomic survey datasets (eleven gradients pooled) averaged across (A) all eleven gradients, (B) two spatial gradients, (C) five contemporary environmental gradients (MAT, MAP, urban, agriculture, total disturbance), and (D) two climate change gradients (∆MAT, ∆MAP). Index values are D‐statistics from Kolmogorov–Smirnov goodness‐of‐fit, indicating strong or weak (low or high D‐statistics, respectively) congruence between survey datasets and the background environment. Vertical gray and red lines represent the mean of all twelve survey datasets for all surveyed grid cells and well‐surveyed grid cells, respectively.
Figure 6
Figure 6
Coverage indices for each of eleven temporal, spatial, or environmental gradients (twelve taxonomic survey datasets pooled) averaged across (A) GBIF, (B) standardized (i.e., BBS and FFS), (C) terrestrial, and (D) aquatic datasets. Index values are D‐statistics from Kolmogorov–Smirnov goodness‐of‐fit, indicating strong or weak (low or high D‐statistics, respectively) congruence between survey datasets and the background environment. Vertical gray and red lines represent the mean of all eleven datasets for all surveyed grid cells and well‐surveyed grid cells, respectively.

References

    1. Bahn, V. , and McGill B. J.. 2007. Can niche‐based distribution models outperform spatial interpolation? Glob. Ecol. Biogeogr. 16:733–742.
    1. Bates, B. C. , Kundzewicz Z. W., Wu S., and Palutikof J.. 2008. Climate Change and Water. Technical Paper of the Intergovernmental Panel on Climate Change, IPCC Secretariat, Geneva.
    1. Beck, J. , Ballesteros‐Mejia L., Nagel P., and Kitching I. J.. 2013. Online solutions and the ‘Wallacean shortfall’: what does GBIF contribute to our knowledge of species’ ranges? Divers. Distrib. 19:1043–1050.
    1. Bennett, E. M. , Carpenter S. R., and Caraco N. F.. 2001. Human impact on erodable phosphorus and eutrophication: a global perspective increasing accumulation of phosphorus in soil threatens rivers, lakes, and coastal oceans with eutrophication. Bioscience 51:227–234.
    1. Brito, D. 2010. Overcoming the Linnean shortfall: data deficiency and biological survey priorities. Basic Appl. Ecol., 11, 709–713.

LinkOut - more resources