Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Aug 12;19(8):e0305708.
doi: 10.1371/journal.pone.0305708. eCollection 2024.

Deep transfer learning-based bird species classification using mel spectrogram images

Affiliations

Deep transfer learning-based bird species classification using mel spectrogram images

Mrinal Kanti Baowaly et al. PLoS One. .

Abstract

The classification of bird species is of significant importance in the field of ornithology, as it plays an important role in assessing and monitoring environmental dynamics, including habitat modifications, migratory behaviors, levels of pollution, and disease occurrences. Traditional methods of bird classification, such as visual identification, were time-intensive and required a high level of expertise. However, audio-based bird species classification is a promising approach that can be used to automate bird species identification. This study aims to establish an audio-based bird species classification system for 264 Eastern African bird species employing modified deep transfer learning. In particular, the pre-trained EfficientNet technique was utilized for the investigation. The study adapts the fine-tune model to learn the pertinent patterns from mel spectrogram images specific to this bird species classification task. The fine-tuned EfficientNet model combined with a type of Recurrent Neural Networks (RNNs) namely Gated Recurrent Unit (GRU) and Long short-term memory (LSTM). RNNs are employed to capture the temporal dependencies in audio signals, thereby enhancing bird species classification accuracy. The dataset utilized in this work contains nearly 17,000 bird sound recordings across a diverse range of species. The experiment was conducted with several combinations of EfficientNet and RNNs, and EfficientNet-B7 with GRU surpasses other experimental models with an accuracy of 84.03% and a macro-average precision score of 0.8342.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. The workflow of proposed method.
Fig 2
Fig 2. Distribution of birds’ species data.
Fig 3
Fig 3. Mel-spectogram images of some bird species audio.
Fig 4
Fig 4. Accuracy and loss of fine-tuned EfficientNet-B7 transfer learning model.
Fig 5
Fig 5. Accuracy and loss of fine-tuned EfficientNet-B7 embedded with LSTM.
Fig 6
Fig 6. Accuracy and loss of fine-tuned EfficientNet-B7 embedded with GRU.

References

    1. Remsen JV. The importance of continued collecting of bird specimens to ornithology and bird conservation. Bird Conservation International. 1995;5:146–180. doi: 10.1017/S095927090000099X - DOI
    1. Gregory RD, Skorpilova J, Vorisek P, Butler S. An analysis of trends, uncertainty and species selection shows contrasting trends of widespread forest and farmland birds in Europe. Ecological Indicators. 2019;103:676–687. doi: 10.1016/j.ecolind.2019.04.064 - DOI
    1. Yao S, Li X, Liu C, Zhang J, Li Y, Gan T, et al.. New assessment indicator of habitat suitability for migratory bird in wetland based on hydrodynamic model and vegetation growth threshold. Ecological Indicators. 2020;117:106556. doi: 10.1016/j.ecolind.2020.106556 - DOI
    1. Young J, Watt A, Nowicki P, Alard D, Clitherow J, Henle K, et al.. Towards sustainable land use: identifying and managing the conflicts between human activities and biodiversity conservation in Europe. Biodiversity and Conservation. 2005;14:1641–1661. doi: 10.1007/s10531-004-0536-z - DOI
    1. Morrison ML. In: Johnston RF, editor. Bird Populations as Indicators of Environmental Change. Boston, MA: Springer US; 1986. p. 429–451. Available from: 10.1007/978-1-4615-6784-4_10. - DOI

LinkOut - more resources