Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Aug 18:10:e13921.
doi: 10.7717/peerj.13921. eCollection 2022.

Enhancing georeferenced biodiversity inventories: automated information extraction from literature records reveal the gaps

Affiliations

Enhancing georeferenced biodiversity inventories: automated information extraction from literature records reveal the gaps

Bjørn Tore Kopperud et al. PeerJ. .

Abstract

We use natural language processing (NLP) to retrieve location data for cheilostome bryozoan species (text-mined occurrences (TMO)) in an automated procedure. We compare these results with data combined from two major public databases (DB): the Ocean Biodiversity Information System (OBIS), and the Global Biodiversity Information Facility (GBIF). Using DB and TMO data separately and in combination, we present latitudinal species richness curves using standard estimators (Chao2 and the Jackknife) and range-through approaches. Our combined DB and TMO species richness curves quantitatively document a bimodal global latitudinal diversity gradient for extant cheilostomes for the first time, with peaks in the temperate zones. A total of 79% of the georeferenced species we retrieved from TMO (N = 1,408) and DB (N = 4,549) are non-overlapping. Despite clear indications that global location data compiled for cheilostomes should be improved with concerted effort, our study supports the view that many marine latitudinal species richness patterns deviate from the canonical latitudinal diversity gradient (LDG). Moreover, combining online biodiversity databases with automated information retrieval from the published literature is a promising avenue for expanding taxon-location datasets.

Keywords: Bimodality; Bryozoa; Geographic distribution; Latitudinal diversity gradient (LDG); Marine invertebrates; Natural langauge processing (NLP); Public data repositories; Species richness; Text-mining.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Figure 1
Figure 1. Global range-through latitudinal species richness for cheilostome bryozoans.
The black line shows combined database (DB = OBIS and GBIF) and text-mined occurrence (TMO) richness, and orange and green curves show range-through richness for DB and TMO separately. The inset is a Venn diagram showing the global overlap in species between DB and TMO.
Figure 2
Figure 2. Global latitudinal species richness for cheilostome bryozoans, estimated using Chao2 and Jackknife.
The top panels (A & B) show richness for database (DB = OBIS and GBIF) and text-mined occurrences (TMO) data in 5° equal-angle latitudinal bands. The lower panels (C & D) show the equivalent in 5° equal-area latitudinal bands. Black lines show the observed richness, while blue and orange lines show the Chao2 and Jackknife estimates, respectively. The shaded areas are 95% confidence intervals. See Figs. S7 and S8 for alternative band and bin sizes.
Figure 3
Figure 3. Range-through latitudinal species richness for cheilostome bryozoans in the Atlantic and Pacific Oceans.
(A, C, D) Species richness in the Atlantic; (B, D, F) that in the Pacific. The panel rows represent the eastern, western or the entire ocean basins. Orange and green lines represent database (DB = OBIS and GBIF) and text-mined occurrences (TMO), respectively, and black lines are the joint data. Note that in this figure, the Atlantic borders Greenland and Iceland in the north, and the Antarctic in the south, but does not include the Gulf of Mexico, the Caribbean, the Baltic Sea or the Mediterranean. The Pacific borders the Bering Strait in the north, and includes the South China Sea, the Java Sea, north and east Australia, Tasmania as well as the Antarctic border.
Figure 4
Figure 4. Heatmaps for cheilostome bryozoan occurrence records per 5° latitude by 5° longitude bins.
The color axes are truncated for visualization purposes, to a maximum of 200, 200 and 2,000 in (A), (B), (C), respectively. (B) and (C) show the same sampling data, but in (C) the upper limit of the color axis is expanded by ten-fold. There are about 900 maximum records per bin in the Mediterranean for the text-mined occurrences (TMO), and about 66,000 maximum records in the British Isles for the Ocean Biodiversity Information System (OBIS) and Global Biodiversity Information Facility (GBIF) data combined. The globe is plotted using the Robinson projection. See Fig. S11 for the same figure plotted using the plate carrée projection.

References

    1. Barnes DK, Griffiths HJ. Biodiversity and biogeography of southern temperate and polar bryozoans. Global Ecology and Biogeography. 2008;17(1):84–99. doi: 10.1111/j.1466-8238.2007.00342.x. - DOI
    1. Bock P. Recent and fossil bryozoa. 2022. http://www.bryozoa.net. [4 March 2022]. http://www.bryozoa.net
    1. Bock PE, Gordon DP. Phylum Bryozoa Ehrenberg 1831. Zootaxa. 2013;3703(1):67–74. doi: 10.11646/zootaxa.3703.1.14. - DOI - PubMed
    1. Bojanowski P, Grave E, Joulin A, Mikolov T. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics. 2017;5(1):135–146. doi: 10.1162/tacl_a_00051. - DOI
    1. Boonzaaier-Davids M, Florence W, Gibbons M. Zoogeography of marine Bryozoa around South Africa. African Journal of Marine Science. 2020;42(2):185–198. doi: 10.2989/1814232X.2020.1765870. - DOI

Publication types

LinkOut - more resources