Machine Learning Using Digitized Herbarium Specimens to Advance Phenological Research

Katelin D Pearson¹, Gil Nelson², Myla F J Aronson³, Pierre Bonnet⁴, Laura Brenskelle⁵, Charles C Davis⁶, Ellen G Denny⁷, Elizabeth R Ellwood⁸, Hervé Goëau⁴, J Mason Heberling⁹, Alexis Joly¹⁰, Titouan Lorieul¹⁰, Susan J Mazer¹¹, Emily K Meineke¹², Brian J Stucky⁵, Patrick Sweeney¹³, Alexander E White¹⁴, Pamela S Soltis¹⁵

Affiliations

¹ California Polytechnic State University, San Luis Obispo, California.
² Florida Museum of Natural History, Gainesville, Florida.
³ Department of Ecology, Evolution, and Natural Resources, Rutgers, the State University of New Jersey, New Brunswick, New Jersey.
⁴ AMAP, the University of Montpellier and with The French Agricultural Research Centre for International Development, Centre National de la Recherche Scientifique, Institut National de la Recherche Agronomique, Institut de Recherche pour le Développement, Botanique et Modélisation de l'Architecture des Plantes et des végétations in Montpellier, France.
⁵ Florida Museum of Natural History, the University of Florida, Gainesville, Florida.
⁶ Harvard University Herbaria, Cambridge, Massachusetts.
⁷ US National Phenology Network and with the University of Arizona, Tucson, Arizona.
⁸ Natural History Museum of Los Angeles County, La Brea Tar Pits and Museum, Los Angeles, California.
⁹ Carnegie Museum of Natural History, Pittsburgh, Pennsylvania.
¹⁰ Inria Sophia-Antipolis, Zenith team, Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM), Montpellier, France.
¹¹ Department of Ecology, Evolution, and Marine Biology, the University of California, Santa Barbara, Santa Barbara, California.
¹² Department of Entomology and Nematology, the University of California, Davis, Davis, California.
¹³ Yale Peabody Museum of Natural History, New Haven, Connecticut.
¹⁴ Department of Botany and the Data Science Lab, the Smithsonian Institution, Washington, DC.
¹⁵ Florida Museum of Natural History and with the University of Florida Biodiversity Institute, the University of Florida, Gainesville, Florida.

PMID: 32665738
PMCID: PMC7340542
DOI: 10.1093/biosci/biaa044

Machine Learning Using Digitized Herbarium Specimens to Advance Phenological Research

Katelin D Pearson et al. Bioscience. 2020.

. 2020 Jul 1;70(6):610-620.

doi: 10.1093/biosci/biaa044. Epub 2020 May 13.

Authors

Affiliations

¹ California Polytechnic State University, San Luis Obispo, California.
² Florida Museum of Natural History, Gainesville, Florida.
³ Department of Ecology, Evolution, and Natural Resources, Rutgers, the State University of New Jersey, New Brunswick, New Jersey.
⁴ AMAP, the University of Montpellier and with The French Agricultural Research Centre for International Development, Centre National de la Recherche Scientifique, Institut National de la Recherche Agronomique, Institut de Recherche pour le Développement, Botanique et Modélisation de l'Architecture des Plantes et des végétations in Montpellier, France.
⁵ Florida Museum of Natural History, the University of Florida, Gainesville, Florida.
⁶ Harvard University Herbaria, Cambridge, Massachusetts.
⁷ US National Phenology Network and with the University of Arizona, Tucson, Arizona.
⁸ Natural History Museum of Los Angeles County, La Brea Tar Pits and Museum, Los Angeles, California.
⁹ Carnegie Museum of Natural History, Pittsburgh, Pennsylvania.
¹⁰ Inria Sophia-Antipolis, Zenith team, Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM), Montpellier, France.
¹¹ Department of Ecology, Evolution, and Marine Biology, the University of California, Santa Barbara, Santa Barbara, California.
¹² Department of Entomology and Nematology, the University of California, Davis, Davis, California.
¹³ Yale Peabody Museum of Natural History, New Haven, Connecticut.
¹⁴ Department of Botany and the Data Science Lab, the Smithsonian Institution, Washington, DC.
¹⁵ Florida Museum of Natural History and with the University of Florida Biodiversity Institute, the University of Florida, Gainesville, Florida.

PMID: 32665738
PMCID: PMC7340542
DOI: 10.1093/biosci/biaa044

Abstract

Machine learning (ML) has great potential to drive scientific discovery by harvesting data from images of herbarium specimens-preserved plant material curated in natural history collections-but ML techniques have only recently been applied to this rich resource. ML has particularly strong prospects for the study of plant phenological events such as growth and reproduction. As a major indicator of climate change, driver of ecological processes, and critical determinant of plant fitness, plant phenology is an important frontier for the application of ML techniques for science and society. In the present article, we describe a generalized, modular ML workflow for extracting phenological data from images of herbarium specimens, and we discuss the advantages, limitations, and potential future improvements of this workflow. Strategic research and investment in specimen-based ML methods, along with the aggregation of herbarium specimen data, may give rise to a better understanding of life on Earth.

Keywords: biodiversity; climate change; deep learning; machine learning; phenology.

PubMed Disclaimer

Figures

**Figure 1.**
Key components of a generalized, modular machine learning (ML) workflow applied to the annotation of herbarium specimen images for phenological traits. Specimen images are retrieved from storage, and a representative subset of the focal images are used to generate a set of training data. The training data, which have been manually annotated according to the desired phenological scoring protocol (e.g., flowers present or absent), are used as input data for ML. The resulting statistical model is then deployed to predict phenological annotations for previously unannotated specimens. The accuracy and precision of the ML model(s) can be tested using a subset of manually annotated data to compare predicted annotations to those recorded by expert observers. Newly annotated specimens, combined with specimen label data, georeferenced localities, and other data sets (e.g., historical climate data), can then be used in an array of phenological research.

**Figure 2.**
Examples of herbarium specimens displaying visual heterogeneity (e.g., in morphology, labels, and color standards) and challenges related to the morphology and position of reproductive structures. The specimen on the left shows large, isolated reproductive structures that would likely be annotated successfully by ML algorithms. The specimen in the center with small flowers and numerous overlapping fruits would be much more difficult for ML algorithms to parse. The specimen on the right would be very difficult for ML algorithms to delineate or count because of the unclear distinction between flowers and buds. Examples of segmentation masks (see the glossary in box 1) created to delineate reproductive structures are shown by the brightly colored areas in the left and center specimen images.

See this image and copyright information in PMC

References

1. Affouard A, Goëau H, Bonnet P, Lombardo JC, Joly A. 2017. Pl@ntnet app in the era of deep learning. ICLR 2017 Workshop Track: 5th International Conference on Learning Representations, Toulon, France.
1. Bengio Y. 2012. Deep learning of representations for unsupervised and transfer learning. In Proceedings of ICML workshop on unsupervised and transfer learning. 17–36.
1. Bison M, Yoccoz NG, Carlson BZ, Delestrade A. 2018. Comparison of budburst phenology trends and precision among participants in a citizen science program. International Journal of Biometeorology 63: 61–72. - PubMed
1. Brenskelle L, Stucky BJ, Deck J, Walls R, Guralnick RP. 2019. Integrating herbarium specimen observations into global phenology data systems. Applications in Plant Sciences 279: e01231. - PMC - PubMed
1. Burkle LA, Marlin JC, Knight TM. 2013. Plant-pollinator interactions over 120 years: Loss of species, co-occurrence, and function. Science 339: 1611–1615. - PubMed

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Machine Learning Using Digitized Herbarium Specimens to Advance Phenological Research

Affiliations

Machine Learning Using Digitized Herbarium Specimens to Advance Phenological Research

Authors

Affiliations

Abstract

Figures

References

LinkOut - more resources

Full Text Sources