Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jul 1;70(6):610-620.
doi: 10.1093/biosci/biaa044. Epub 2020 May 13.

Machine Learning Using Digitized Herbarium Specimens to Advance Phenological Research

Affiliations

Machine Learning Using Digitized Herbarium Specimens to Advance Phenological Research

Katelin D Pearson et al. Bioscience. .

Abstract

Machine learning (ML) has great potential to drive scientific discovery by harvesting data from images of herbarium specimens-preserved plant material curated in natural history collections-but ML techniques have only recently been applied to this rich resource. ML has particularly strong prospects for the study of plant phenological events such as growth and reproduction. As a major indicator of climate change, driver of ecological processes, and critical determinant of plant fitness, plant phenology is an important frontier for the application of ML techniques for science and society. In the present article, we describe a generalized, modular ML workflow for extracting phenological data from images of herbarium specimens, and we discuss the advantages, limitations, and potential future improvements of this workflow. Strategic research and investment in specimen-based ML methods, along with the aggregation of herbarium specimen data, may give rise to a better understanding of life on Earth.

Keywords: biodiversity; climate change; deep learning; machine learning; phenology.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Key components of a generalized, modular machine learning (ML) workflow applied to the annotation of herbarium specimen images for phenological traits. Specimen images are retrieved from storage, and a representative subset of the focal images are used to generate a set of training data. The training data, which have been manually annotated according to the desired phenological scoring protocol (e.g., flowers present or absent), are used as input data for ML. The resulting statistical model is then deployed to predict phenological annotations for previously unannotated specimens. The accuracy and precision of the ML model(s) can be tested using a subset of manually annotated data to compare predicted annotations to those recorded by expert observers. Newly annotated specimens, combined with specimen label data, georeferenced localities, and other data sets (e.g., historical climate data), can then be used in an array of phenological research.
Figure 2.
Figure 2.
Examples of herbarium specimens displaying visual heterogeneity (e.g., in morphology, labels, and color standards) and challenges related to the morphology and position of reproductive structures. The specimen on the left shows large, isolated reproductive structures that would likely be annotated successfully by ML algorithms. The specimen in the center with small flowers and numerous overlapping fruits would be much more difficult for ML algorithms to parse. The specimen on the right would be very difficult for ML algorithms to delineate or count because of the unclear distinction between flowers and buds. Examples of segmentation masks (see the glossary in box 1) created to delineate reproductive structures are shown by the brightly colored areas in the left and center specimen images.

References

    1. Affouard A, Goëau H, Bonnet P, Lombardo JC, Joly A. 2017. Pl@ntnet app in the era of deep learning. ICLR 2017 Workshop Track: 5th International Conference on Learning Representations, Toulon, France.
    1. Bengio Y. 2012. Deep learning of representations for unsupervised and transfer learning. In Proceedings of ICML workshop on unsupervised and transfer learning. 17–36.
    1. Bison M, Yoccoz NG, Carlson BZ, Delestrade A. 2018. Comparison of budburst phenology trends and precision among participants in a citizen science program. International Journal of Biometeorology 63: 61–72. - PubMed
    1. Brenskelle L, Stucky BJ, Deck J, Walls R, Guralnick RP. 2019. Integrating herbarium specimen observations into global phenology data systems. Applications in Plant Sciences 279: e01231. - PMC - PubMed
    1. Burkle LA, Marlin JC, Knight TM. 2013. Plant-pollinator interactions over 120 years: Loss of species, co-occurrence, and function. Science 339: 1611–1615. - PubMed