Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Oct 4;19(10):e1011541.
doi: 10.1371/journal.pcbi.1011541. eCollection 2023 Oct.

Adaptive representations of sound for automatic insect recognition

Affiliations

Adaptive representations of sound for automatic insect recognition

Marius Faiß et al. PLoS Comput Biol. .

Abstract

Insect population numbers and biodiversity have been rapidly declining with time, and monitoring these trends has become increasingly important for conservation measures to be effectively implemented. But monitoring methods are often invasive, time and resource intense, and prone to various biases. Many insect species produce characteristic sounds that can easily be detected and recorded without large cost or effort. Using deep learning methods, insect sounds from field recordings could be automatically detected and classified to monitor biodiversity and species distribution ranges. We implement this using recently published datasets of insect sounds (up to 66 species of Orthoptera and Cicadidae) and machine learning methods and evaluate their potential for acoustic insect monitoring. We compare the performance of the conventional spectrogram-based audio representation against LEAF, a new adaptive and waveform-based frontend. LEAF achieved better classification performance than the mel-spectrogram frontend by adapting its feature extraction parameters during training. This result is encouraging for future implementations of deep learning technology for automatic insect sound recognition, especially as larger datasets become available.

PubMed Disclaimer

Conflict of interest statement

I have read the journal’s policy and the authors of this manuscript have the following competing interests: DS is an Academic Editor for PLOS Computational Biology.

Figures

Fig 1
Fig 1. Two spectrograms of the same recording of Gryllus campestris.
Spectrogram A displays the frequency axis linearly in Hz. Spectrogram B uses the mel frequency scale, which compresses the frequency axis to show higher resolution in lower frequency bands than in higher bands, mimicking the human perception of frequency. Both spectrograms display the same spectrum of frequencies. Due to the mostly high-frequency information and empty low frequencies in this recording, the mel spectrogram B obscures a large amount of information compared to the linear spectrogram A.
Fig 2
Fig 2. Example of the data augmentation workflow used on the training set (InsectSet47 and InsectSet66). Noise is added at a randomized signal-to-noise ratio and frequency distribution.
Then an impulse response from an outdoor location is applied at a randomized mix ratio.
Fig 3
Fig 3. Classification outcome for all 32 species in the test set using the best run of the mel frontend performing at 67% classification accuracy.
The vertical axis displays the true labels of the files, the horizontal axis shows the predicted labels, sorted alphabetically. Classifications within the two biggest genera Platypleura (green) and Myopsalta (red) are highlighted for comparison to the LEAF confusion matrix.
Fig 4
Fig 4. Classification outcome for all 32 species in the test set using the best run of the LEAF frontend performing at 78% classification accuracy.
The vertical axis displays the true labels of the files, the horizontal axis shows the predicted labels, sorted alphabetically. Classifications within the two biggest genera Platypleura (green) and Myopsalta (red) are highlighted for comparison to the mel confusion matrix.
Fig 5
Fig 5. Center frequencies of all 64 filters used in the best performing LEAF run on InsectSet32.
Plots A and D show the initialization curve before training, which is based on the mel scale. Plots B and E show the deviation of each filter from their initialized position after training. Plots C and F show the filters sorted by center frequency, and demonstrate the overall coverage of the frequency range, but do not represent the real ordering in the LEAF representations. Violin plots show the density of filters over the frequency spectrum, the orange line shows the initialization curve for comparison.

References

    1. Song H, Béthoux O, Shin S, Donath A, Letsch H, Liu S, et al. Phylogenomic analysis sheds light on the evolutionary pathways towards acoustic communication in Orthoptera. Nat Commun. 2020;11: 4939. doi: 10.1038/s41467-020-18739-4 - DOI - PMC - PubMed
    1. Young D, Bennet-Clark HC. The Role of the Tymbal in Cicada Sound Production. The Journal of Experimental Biology. 1995; 1001–1019. doi: 10.1242/jeb.198.4.1001 - DOI - PubMed
    1. Luo C, Wei C, Nansen C. How Do “Mute” Cicadas Produce Their Calling Songs? Machado RB, editor.PLoS ONE. 2015;10: e0118554. doi: 10.1371/journal.pone.0118554 - DOI - PMC - PubMed
    1. Bennet-Clark HC. How Cicadas Make their Noise. Sci Am. 1998;278: 58–61. doi: 10.1038/scientificamerican0598-58 - DOI
    1. Heller K-G, Baker E, Ingrisch S, Korsunovskaya O, Liu C-X, Riede K, et al. Bioacoustics and systematics of Mecopoda (and related forms) from South East Asia and adjacent areas (Orthoptera, Tettigonioidea, Mecopodinae) including some chromosome data. Zootaxa. 2021;5005: 101–144. doi: 10.11646/zootaxa.5005.2.1 - DOI - PubMed

Publication types