Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Nov 30:11:e109439.
doi: 10.3897/BDJ.11.e109439. eCollection 2023.

Envisaging a global infrastructure to exploit the potential of digitised collections

Affiliations

Envisaging a global infrastructure to exploit the potential of digitised collections

Quentin Groom et al. Biodivers Data J. .

Abstract

Tens of millions of images from biological collections have become available online over the last two decades. In parallel, there has been a dramatic increase in the capabilities of image analysis technologies, especially those involving machine learning and computer vision. While image analysis has become mainstream in consumer applications, it is still used only on an artisanal basis in the biological collections community, largely because the image corpora are dispersed. Yet, there is massive untapped potential for novel applications and research if images of collection objects could be made accessible in a single corpus. In this paper, we make the case for infrastructure that could support image analysis of collection objects. We show that such infrastructure is entirely feasible and well worth investing in.

Keywords: biodiversity; computer vision; functional traits; machine learning; species identification; specimens.

PubMed Disclaimer

Conflict of interest statement

No conflict of interest to declare Disclaimer: This article is (co-)authored by any of the Editors-in-Chief, Managing Editors or their deputies in this journal.

Figures

Figure 1.
Figure 1.
Progress in digitising natural history collections. A growing number of images are accessible from the Global Biodiversity Information Facility, iDigBio or BioCaSE. To examine the rate and volume of digitisation, we used six snapshots of these databases taken since 2019, using Preston, a biodiversity dataset tracker (Poelen 2022, Poelen and Groom 2022, Elliott et al. 2022). Although likely to be an underestimate of specimen images, because not all are linked to the snapshot datasets, trends give an indication of digitisation progress. The number of available images is increasing approximately exponentially. There are seven times more plant specimens than insects in our most recent snapshot, though insects are far more numerous in nature, an estimated 5.5 million species of insects (Stork 2018) vs. 350,000 plants (Cheek et al. 2020). Nevertheless, the rate of increase of insect images is faster and, if one extrapolates the curves, it is easy to imagine that insect images will surpass plant specimens in a few years. Imaging of mammalia (~ 6,400 species; Burgin et al. (2018)), while increasing, is not doing so as rapidly as insects.
Figure 2a.
Figure 2a.
Paratype of Heraclidesrumiko, showing information encoded on multiple labels. Catalogue number NHMUK012824346 by The Trustees of the Natural History Museum, London (CC-BY).
Figure 2b.
Figure 2b.
Specimen of a chewing lice (Philopteridae): Strongylocoteslipogonus, a parasitic species including host information on the label. Catalogue number NHMUK010694309 by The Trustees of the Natural History Museum, London (CC-BY).
Figure 3a.
Figure 3a.
Label of Potentillarecta with distinctive label decorations (BR0000009398214; CC-By-SA) (B);
Figure 3b.
Figure 3b.
Label of Eriophorumangustifolium where collector’s signature can be recognised (BR0000005134137; CC-By-SA);
Figure 3c.
Figure 3c.
Distinct cup-shaped label of Agathosmavillosum (BR0000015671271; CC-By-SA);
Figure 3d.
Figure 3d.
Label of Alyssumcalycinum collected by François Crépin, notorious for illegible, but recognisable handwriting (BR0000010426135; CC-By-SA).
Figure 4.
Figure 4.
Embossed crests and stamps on herbarium specimens. A Lion and crown signifying ownership by the Botanical Garden of Brussels BR0000013433048 of BR Herbarium (CC-BY-SA 4.0). B Stamp of the A.C. Moore Herbarium at the University of South Carolina as on specimen USCH0030719 (image in public domain). C Stamp of the Watson Botanical Exchange Club on specimen E00809288 of the Royal Botanic Garden Edinburgh Herbarium (public domain). D Stamp of the A. C. Moore Herbarium at the University of South Carolina, USCH0030719 (public domain). E Stamp of the Botanical Exchange Club of the British Isles on specimen E00919066 of the Royal Botanic Garden Edinburgh Herbarium (public domain). F Stamp with handwriting is evidence of a loan from the BR Herbarium to the Herbarium Musei Parisiensis, P, on specimen BR0000017682725 of Meise Botanic Garden (CC-BY-SA 4.0). G Printed crest, P00605317 held by Museum National d’Histoire Naturelle (CC-BY 4.0). H A stamp on specimen LISC036829 held by the LISC Herbarium of the Instituto de Investigação Científica Tropical. l a crest used by the Muséum National d’Histoire Naturelle (MNHN - Paris), on specimen PC0702930. (licensed under CC-By 4.0). J A stamped star with unknown meaning on the same specimen as (B). K A stamp belonging to the Herbarium I. Thériot, on specimen PC0702930 at the Herbarium of the Muséum National d’Histoire Naturelle. (CC-BY 4.0). L A stamp belonging to the Universidad Estatal Amazónica, now housed in the Missouri Botanical Garden Herbarium under catalogue number 101178648 (CC-BY-SA 4.0).
Figure 5.
Figure 5.
Framework of an infrastructure for analysis of specimen images showing the services, storage and relationships between them.

References

    1. Allan E Louise, Livermore Laurence, Price Benjamin, Shchedrina Olha, Smith Vincent. A novel automated mass digitisation workflow for natural history microscope slides. Biodiversity Data Journal. 2019;7 doi: 10.3897/bdj.7.e32342. - DOI - PMC - PubMed
    1. Antonelli Alexandre, Hiscock Simon, Lennon Sarah, Simmonds Monique, Smith Rhian J., Young Bennett. Protecting and sustainably using the world’s plants and fungi. Plants, People, Planet. 2020;2(5):368–370. doi: 10.1002/ppp3.10150. - DOI
    1. Bauters Marijn, Meeus Sofie, Barthel Matti, Stoffelen Piet, De Deurwaerder Hannes P. T., Meunier Félicien, Drake Travis W., Ponette Quentin, Ebuy Jerôme, Vermeir Pieter, Beeckman Hans, wyffels Francis, Bodé Samuel, Verbeeck Hans, Vandelook Filip, Boeckx Pascal. Century‐long apparent decrease in intrinsic water‐use efficiency with no evidence of progressive nutrient limitation in African tropical forests. Global Change Biology. 2020;26(8):4449–4461. doi: 10.1111/gcb.15145. - DOI - PubMed
    1. Bhalerao Abhir, Reynolds Gregory. Ruler detection for autoscaling forensic mages. International Journal of Digital Crime and Forensics. 2014;6(1):9–27. doi: 10.4018/ijdcf.2014010102. - DOI
    1. Boakes Elizabeth H., McGowan Philip J. K., Fuller Richard A., Chang-qing Ding, Clark Natalie E., O'Connor Kim, Mace Georgina M. Distorted views of biodiversity: Spatial and temporal bias in species occurrence data. PLOS Biology. 2010;8(6) doi: 10.1371/journal.pbio.1000385. - DOI - PMC - PubMed