. 2024 Dec 19;24(24):8122.

doi: 10.3390/s24248122.

Towards Context-Rich Automated Biodiversity Assessments: Deriving AI-Powered Insights from Camera Trap Data

Affiliations

¹ School of Computer Science and Mathematics, Liverpool John Moores University, James Parsons Building, Byrom Street, Liverpool L3 3AF, UK.
² Chester Zoo, Upton-by-Chester, Chester CH2 IEU, UK.
³ Welgevonden Game Reserve, P.O. Box 433, Vaalwater 0530, South Africa.
⁴ School of Mathematics and Statistics, Mathematical Institute, University of St Andrews, North Haugh, St Andrews KY16 9SS, UK.
⁵ School of Biological Sciences, University of Aberdeen, Tillydrone Avenue, Aberdeen AB24 2TZ, UK.
⁶ Astrophysics Research Institute, Liverpool John Moores University, IC2, Liverpool Science Park, 146 Brownlow Hill, Liverpool L3 5RF, UK.
⁷ School of Biological and Environmental Sciences, Liverpool John Moores University, James Parsons Building, Byrom Street, Liverpool L3 3AF, UK.

PMID: 39771857
PMCID: PMC11679253
DOI: 10.3390/s24248122

Towards Context-Rich Automated Biodiversity Assessments: Deriving AI-Powered Insights from Camera Trap Data

Paul Fergus et al. Sensors (Basel). 2024.

. 2024 Dec 19;24(24):8122.

doi: 10.3390/s24248122.

Authors

Affiliations

¹ School of Computer Science and Mathematics, Liverpool John Moores University, James Parsons Building, Byrom Street, Liverpool L3 3AF, UK.
² Chester Zoo, Upton-by-Chester, Chester CH2 IEU, UK.
³ Welgevonden Game Reserve, P.O. Box 433, Vaalwater 0530, South Africa.
⁴ School of Mathematics and Statistics, Mathematical Institute, University of St Andrews, North Haugh, St Andrews KY16 9SS, UK.
⁵ School of Biological Sciences, University of Aberdeen, Tillydrone Avenue, Aberdeen AB24 2TZ, UK.
⁶ Astrophysics Research Institute, Liverpool John Moores University, IC2, Liverpool Science Park, 146 Brownlow Hill, Liverpool L3 5RF, UK.
⁷ School of Biological and Environmental Sciences, Liverpool John Moores University, James Parsons Building, Byrom Street, Liverpool L3 3AF, UK.

PMID: 39771857
PMCID: PMC11679253
DOI: 10.3390/s24248122

Abstract

Camera traps offer enormous new opportunities in ecological studies, but current automated image analysis methods often lack the contextual richness needed to support impactful conservation outcomes. Integrating vision-language models into these workflows could address this gap by providing enhanced contextual understanding and enabling advanced queries across temporal and spatial dimensions. Here, we present an integrated approach that combines deep learning-based vision and language models to improve ecological reporting using data from camera traps. We introduce a two-stage system: YOLOv10-X to localise and classify species (mammals and birds) within images and a Phi-3.5-vision-instruct model to read YOLOv10-X bounding box labels to identify species, overcoming its limitation with hard-to-classify objects in images. Additionally, Phi-3.5 detects broader variables, such as vegetation type and time of day, providing rich ecological and environmental context to YOLO's species detection output. When combined, this output is processed by the model's natural language system to answer complex queries, and retrieval-augmented generation (RAG) is employed to enrich responses with external information, like species weight and IUCN status (information that cannot be obtained through direct visual analysis). Combined, this information is used to automatically generate structured reports, providing biodiversity stakeholders with deeper insights into, for example, species abundance, distribution, animal behaviour, and habitat selection. Our approach delivers contextually rich narratives that aid in wildlife management decisions. By providing contextually rich insights, our approach not only reduces manual effort but also supports timely decision making in conservation, potentially shifting efforts from reactive to proactive.

Keywords: biodiversity monitoring; deep learning; large language models; object detection; vision transformers; wildlife conservation.

PubMed Disclaimer

Conflict of interest statement

There are no conflicts of interest.

Figures

**Figure A1**
Q1. Read the label on the bounding box to identify the animal. What is the species identified in the image, and what is its IUCN conservation status?

**Figure A2**
Q2 Read the label on the bounding box to identify the animal. What is the average weight of the species identified, and does this species have any notable characteristics or behaviours?

**Figure A3**
Q3. Was the image taken during the day or night, and what environmental factors can be observed (e.g., forest, bush, water sources)?

**Figure A4**
Q4. Read the label on the bounding box to identify the animal. How does the species identified in the image compare to other species in the same habitat in terms of size, behaviour, and diet?

**Figure A5**
Q5. Read the label on the bounding box to identify animals. Can you identify other animals or objects in the image, such as nearby trees, water bodies, or structures?

**Figure A6**
Q6. Read the labels on the bounding boxes to identify animals. What animals are in the image and how many are there of each animal species identified?

**Figure A7**
Q7. Based on the species and its habits, what predictions can be made about its activity at the time the camera trap image was taken (e.g., hunting, foraging, resting)?

**Figure A8**
Q8. Read the label on the bounding box around the animal to determine the species. What potential threats, either natural or human-induced, are most relevant to the species in the image, given its current IUCN status and environment?

**Figure A9**
Q9. Read the label on the bounding box around the animal to determine the species. What is the species role in the ecosystem, and how does its presence effect other species or the environment in the area where the image was captured?

**Figure A10**
Q10. Read the label on the bounding box around the animal to determine the species. What are the known predators or threats to the species in the image, and are there any visible indicators in the environment that suggest the presence of these threats?

**Figure 1**
Flow chart illustrating an overview of the workflow for the YOLOv10-X and Phi3.5-vision-instruct model integration for context-rich camera trap data processing.

**Figure 2**
Class distribution for the Sub-Saharan Africa dataset used to train the YOLOv10-X model to localise and detect mammals, birds, people, and cars.

**Figure 3**
Overview of the YOLOv10 architecture.

**Figure 4**
Image from Limpopo Province in South Africa showing the detection of a zebra at night using a camera trap.

**Figure 5**
Image from Limpopo Province in South Africa showing the detection of a multiple blue wildebeest and zebras using a camera trap.

**Figure 6**
Precision–recall (PR) curve for the YOLOv10-X model trained on 29 Sub-Saharan African species, vehicles, and human subjects.

**Figure 7**
Precision–confidence curve for the model trained on Sub-Saharan African species, vehicles, and human subjects.

**Figure 8**
Recall–confidence curve for the model trained on Sub-Saharan African species, vehicles, and human subjects.

**Figure 9**
F1–confidence curve for the model trained on Sub-Saharan African species, vehicles, and human subjects.

**Figure 10**
The confusion matrix provides a detailed analysis of the model’s classification performance across all Sub-Saharan African species, vehicles, and human subjects.

**Figure 11**
The confusion matrix provides a detailed breakdown of the classifications made by the Phi-3.5-vision model when applied to raw images without YOLOv10-X object detection support.

**Figure 12**
Confusion matrix for the Phi-3.5 model using the bounding boxes from the test case images.

**Figure 13**
Alpaca JSON format showing the question–answer pairs.

**Figure 14**
Sample report using Alpaca Q&A.

See this image and copyright information in PMC

References

1. O’Connell A.F., Nichols J.D., Karanth K.U. Camera Traps in Animal Ecology: Methods and Analyses. Vol. 271 Springer; Berlin/Heidelberg, Germany: 2011.
1. Wearn O.R., Glover-Kapfer P. Snap happy: Camera traps are an effective sampling tool when compared with alternative methods. R. Soc. Open Sci. 2019;6:181748. doi: 10.1098/rsos.181748. - DOI - PMC - PubMed
1. Villa A.G., Salazar A., Vargas F. Towards automatic wild animal monitoring: Identification of animal species in camera-trap images using very deep convolutional neural networks. Ecol. Inform. 2017;41:24–32. doi: 10.1016/j.ecoinf.2017.07.004. - DOI
1. Young S., Rode-Margono J., Amin R. Software to facilitate and streamline camera trap data management: A review. Ecol. Evol. 2018;8:9947–9957. doi: 10.1002/ece3.4464. - DOI - PMC - PubMed
1. Nazir S., Kaleem M. Advances in image acquisition and processing technologies transforming animal ecological studies. Ecol. Inform. 2021;61:101212. doi: 10.1016/j.ecoinf.2021.101212. - DOI

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- MDPI
- PubMed Central
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Towards Context-Rich Automated Biodiversity Assessments: Deriving AI-Powered Insights from Camera Trap Data

Affiliations

Towards Context-Rich Automated Biodiversity Assessments: Deriving AI-Powered Insights from Camera Trap Data

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

MeSH terms

LinkOut - more resources

Full Text Sources

Miscellaneous

Abstract

Conflict of interest statement

Figures

Similar articles

References

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Miscellaneous