Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec 19;24(24):8122.
doi: 10.3390/s24248122.

Towards Context-Rich Automated Biodiversity Assessments: Deriving AI-Powered Insights from Camera Trap Data

Affiliations

Towards Context-Rich Automated Biodiversity Assessments: Deriving AI-Powered Insights from Camera Trap Data

Paul Fergus et al. Sensors (Basel). .

Abstract

Camera traps offer enormous new opportunities in ecological studies, but current automated image analysis methods often lack the contextual richness needed to support impactful conservation outcomes. Integrating vision-language models into these workflows could address this gap by providing enhanced contextual understanding and enabling advanced queries across temporal and spatial dimensions. Here, we present an integrated approach that combines deep learning-based vision and language models to improve ecological reporting using data from camera traps. We introduce a two-stage system: YOLOv10-X to localise and classify species (mammals and birds) within images and a Phi-3.5-vision-instruct model to read YOLOv10-X bounding box labels to identify species, overcoming its limitation with hard-to-classify objects in images. Additionally, Phi-3.5 detects broader variables, such as vegetation type and time of day, providing rich ecological and environmental context to YOLO's species detection output. When combined, this output is processed by the model's natural language system to answer complex queries, and retrieval-augmented generation (RAG) is employed to enrich responses with external information, like species weight and IUCN status (information that cannot be obtained through direct visual analysis). Combined, this information is used to automatically generate structured reports, providing biodiversity stakeholders with deeper insights into, for example, species abundance, distribution, animal behaviour, and habitat selection. Our approach delivers contextually rich narratives that aid in wildlife management decisions. By providing contextually rich insights, our approach not only reduces manual effort but also supports timely decision making in conservation, potentially shifting efforts from reactive to proactive.

Keywords: biodiversity monitoring; deep learning; large language models; object detection; vision transformers; wildlife conservation.

PubMed Disclaimer

Conflict of interest statement

There are no conflicts of interest.

Figures

Figure A1
Figure A1
Q1. Read the label on the bounding box to identify the animal. What is the species identified in the image, and what is its IUCN conservation status?
Figure A2
Figure A2
Q2 Read the label on the bounding box to identify the animal. What is the average weight of the species identified, and does this species have any notable characteristics or behaviours?
Figure A3
Figure A3
Q3. Was the image taken during the day or night, and what environmental factors can be observed (e.g., forest, bush, water sources)?
Figure A4
Figure A4
Q4. Read the label on the bounding box to identify the animal. How does the species identified in the image compare to other species in the same habitat in terms of size, behaviour, and diet?
Figure A5
Figure A5
Q5. Read the label on the bounding box to identify animals. Can you identify other animals or objects in the image, such as nearby trees, water bodies, or structures?
Figure A6
Figure A6
Q6. Read the labels on the bounding boxes to identify animals. What animals are in the image and how many are there of each animal species identified?
Figure A7
Figure A7
Q7. Based on the species and its habits, what predictions can be made about its activity at the time the camera trap image was taken (e.g., hunting, foraging, resting)?
Figure A8
Figure A8
Q8. Read the label on the bounding box around the animal to determine the species. What potential threats, either natural or human-induced, are most relevant to the species in the image, given its current IUCN status and environment?
Figure A9
Figure A9
Q9. Read the label on the bounding box around the animal to determine the species. What is the species role in the ecosystem, and how does its presence effect other species or the environment in the area where the image was captured?
Figure A10
Figure A10
Q10. Read the label on the bounding box around the animal to determine the species. What are the known predators or threats to the species in the image, and are there any visible indicators in the environment that suggest the presence of these threats?
Figure 1
Figure 1
Flow chart illustrating an overview of the workflow for the YOLOv10-X and Phi3.5-vision-instruct model integration for context-rich camera trap data processing.
Figure 2
Figure 2
Class distribution for the Sub-Saharan Africa dataset used to train the YOLOv10-X model to localise and detect mammals, birds, people, and cars.
Figure 3
Figure 3
Overview of the YOLOv10 architecture.
Figure 4
Figure 4
Image from Limpopo Province in South Africa showing the detection of a zebra at night using a camera trap.
Figure 5
Figure 5
Image from Limpopo Province in South Africa showing the detection of a multiple blue wildebeest and zebras using a camera trap.
Figure 6
Figure 6
Precision–recall (PR) curve for the YOLOv10-X model trained on 29 Sub-Saharan African species, vehicles, and human subjects.
Figure 7
Figure 7
Precision–confidence curve for the model trained on Sub-Saharan African species, vehicles, and human subjects.
Figure 8
Figure 8
Recall–confidence curve for the model trained on Sub-Saharan African species, vehicles, and human subjects.
Figure 9
Figure 9
F1–confidence curve for the model trained on Sub-Saharan African species, vehicles, and human subjects.
Figure 10
Figure 10
The confusion matrix provides a detailed analysis of the model’s classification performance across all Sub-Saharan African species, vehicles, and human subjects.
Figure 11
Figure 11
The confusion matrix provides a detailed breakdown of the classifications made by the Phi-3.5-vision model when applied to raw images without YOLOv10-X object detection support.
Figure 12
Figure 12
Confusion matrix for the Phi-3.5 model using the bounding boxes from the test case images.
Figure 13
Figure 13
Alpaca JSON format showing the question–answer pairs.
Figure 14
Figure 14
Sample report using Alpaca Q&A.

Similar articles

  • Large-scale and long-term wildlife research and monitoring using camera traps: a continental synthesis.
    Bruce T, Amir Z, Allen BL, Alting BF, Amos M, Augusteyn J, Ballard GA, Behrendorff LM, Bell K, Bengsen AJ, Bennett A, Benshemesh JS, Bentley J, Blackmore CJ, Boscarino-Gaetano R, Bourke LA, Brewster R, Brook BW, Broughton C, Buettel JC, Carter A, Chiu-Werner A, Claridge AW, Comer S, Comte S, Connolly RM, Cowan MA, Cross SL, Cunningham CX, Dalziell AH, Davies HF, Davis J, Dawson SJ, Di Stefano J, Dickman CR, Dillon ML, Doherty TS, Driessen MM, Driscoll DA, Dundas SJ, Eichholtzer AC, Elliott TF, Elsworth P, Fancourt BA, Fardell LL, Faris J, Fawcett A, Fisher DO, Fleming PJS, Forsyth DM, Garza-Garcia AD, Geary WL, Gillespie G, Giumelli PJ, Gracanin A, Grantham HS, Greenville AC, Griffiths SR, Groffen H, Hamilton DG, Harriott L, Hayward MW, Heard G, Heiniger J, Helgen KM, Henderson TJ, Hernandez-Santin L, Herrera C, Hirsch BT, Hohnen R, Hollings TA, Hoskin CJ, Hradsky BA, Humphrey JE, Jennings PR, Jones ME, Jordan NR, Kelly CL, Kennedy MS, Knipler ML, Kreplins TL, L'Herpiniere KL, Laurance WF, Lavery TH, Le Pla M, Leahy L, Leedman A, Legge S, Leitão AV, Letnic M, Liddell MJ, Lieb ZE, Linley GD, Lisle AT, Lohr CA, Maitz N, Marshall KD, Mason RT, Matheus-Holland DF, McComb LB, McDonald … See abstract for full author list ➔ Bruce T, et al. Biol Rev Camb Philos Soc. 2025 Apr;100(2):530-555. doi: 10.1111/brv.13152. Epub 2025 Jan 17. Biol Rev Camb Philos Soc. 2025. PMID: 39822039 Free PMC article. Review.
  • Temporal insights into ecological community: Advancing waterbird monitoring with dome camera and deep learning.
    Zhang Z, Zhang L, Lu B, Wang H, Zhu W, Guo Y, Cao G, Zhu Y, Wang H, Zhao X, Jian H, Pan M. Zhang Z, et al. J Environ Manage. 2025 Jul;387:125769. doi: 10.1016/j.jenvman.2025.125769. Epub 2025 May 21. J Environ Manage. 2025. PMID: 40403671
  • Camera trap surveys of Atlantic Forest mammals: A data set for analyses considering imperfect detection (2004-2020).
    Franceschi IC, Dornas RADP, Lermen IS, Coelho AVP, Vilas Boas AH, Chiarello AG, Paglia AP, de Souza AC, Borsekowsky AR, Rocha A, Bager A, de Souza AZ, Lopes AMC, de Moura AS, Ferreira AS, García-Olaechea A, Delciellos AC, Bacellar AEF, Campelo AKN, Paschoal AMO, Rolim AC, da Silva ALF, Lanna AM, da Silva AP, Guimarães A, Cardoso Â, Cassol AS, da Costa-Pinto AL, do Nascimento AGS, Fernandes AS, Clyvia A, Santos ABD, Lima-Silva B, Beisiegel BM, Luciano BFL, Leopoldo BF, Krobel BN, Kubiak BB, Saranholi BH, Correa BS, Sant Anna Teixeira C, Ayroza CR, Cassano CR, Benitez-Riveros C, Gestich CC, Tedesco CD, Gheler-Costa C, Hegel CGZ, Evangelista Junior CDS, Ferreira CEMF, Grelle CEV, Esteves CF, Espinosa CDC, Leuchtenberger C, Sanchéz-Lalinde C, Machado CIC, Andreazzi C, Bueno C, Cronemberger de Faria C, Novaes C, Widmer CE, Santos CC, Ferraz DDS, Galiano D, Bôlla DAS, Behs D, Rodrigues DP, de Melo DP, Ramos DMS, de Mattia DL, Pavei DD, Loretto D, Huning DDS, Dias DM, Paetzhold ÉR, Rios E, Setz EZF, Cazetta E, Cafofo Silva EG, Pasa E, Saito EN, de Aguiar EFS, Castro ÉP, Viveiros de Castro EB, Pedó E, Pereira FA, Bolzan F, Roque FO, Mazim FD, Comin FH, Maffei F, Peters FB, Fantacini FM, d… See abstract for full author list ➔ Franceschi IC, et al. Ecology. 2024 May;105(5):e4298. doi: 10.1002/ecy.4298. Epub 2024 Apr 12. Ecology. 2024. PMID: 38610092
  • Estimating species richness and modelling habitat preferences of tropical forest mammals from camera trap data.
    Rovero F, Martin E, Rosa M, Ahumada JA, Spitale D. Rovero F, et al. PLoS One. 2014 Jul 23;9(7):e103300. doi: 10.1371/journal.pone.0103300. eCollection 2014. PLoS One. 2014. PMID: 25054806 Free PMC article.
  • An overview of remote monitoring methods in biodiversity conservation.
    Kerry RG, Montalbo FJP, Das R, Patra S, Mahapatra GP, Maurya GK, Nayak V, Jena AB, Ukhurebor KE, Jena RC, Gouda S, Majhi S, Rout JR. Kerry RG, et al. Environ Sci Pollut Res Int. 2022 Nov;29(53):80179-80221. doi: 10.1007/s11356-022-23242-y. Epub 2022 Oct 5. Environ Sci Pollut Res Int. 2022. PMID: 36197618 Free PMC article. Review.

References

    1. O’Connell A.F., Nichols J.D., Karanth K.U. Camera Traps in Animal Ecology: Methods and Analyses. Vol. 271 Springer; Berlin/Heidelberg, Germany: 2011.
    1. Wearn O.R., Glover-Kapfer P. Snap happy: Camera traps are an effective sampling tool when compared with alternative methods. R. Soc. Open Sci. 2019;6:181748. doi: 10.1098/rsos.181748. - DOI - PMC - PubMed
    1. Villa A.G., Salazar A., Vargas F. Towards automatic wild animal monitoring: Identification of animal species in camera-trap images using very deep convolutional neural networks. Ecol. Inform. 2017;41:24–32. doi: 10.1016/j.ecoinf.2017.07.004. - DOI
    1. Young S., Rode-Margono J., Amin R. Software to facilitate and streamline camera trap data management: A review. Ecol. Evol. 2018;8:9947–9957. doi: 10.1002/ece3.4464. - DOI - PMC - PubMed
    1. Nazir S., Kaleem M. Advances in image acquisition and processing technologies transforming animal ecological studies. Ecol. Inform. 2021;61:101212. doi: 10.1016/j.ecoinf.2021.101212. - DOI

LinkOut - more resources