Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Aug 30;14(1):20192.
doi: 10.1038/s41598-024-71026-w.

A blended framework for audio spoof detection with sequential models and bags of auditory bites

Affiliations

A blended framework for audio spoof detection with sequential models and bags of auditory bites

Misaj Sharafudeen et al. Sci Rep. .

Abstract

An automated speaker verification system uses the process of speech recognition to verify the identity of a user and block illicit access. Logical access attacks are efforts to obtain access to a system by tampering with its algorithms or data, or by circumventing security mechanisms. DeepFake attacks are a form of logical access threats that employs artificial intelligence to produce highly realistic audio clips of human voice, that may be used to circumvent vocal authentication systems. This paper presents a framework for the detection of Logical Access and DeepFake audio spoofings by integrating audio file components and time-frequency representation spectrograms into a lower-dimensional space using sequential prediction models. Bidirectional-LSTM trained on the bonafide class generates significant one-dimensional features for both classes. The feature set is then standardized to a fixed set using a novel Bags of Auditory Bites (BoAB) feature standardizing algorithm. The Extreme Learning Machine maps the feature space to predictions that differentiate between genuine and spoofed speeches. The framework is evaluated using the ASVspoof 2021 dataset, a comprehensive collection of audio recordings designed for evaluating the strength of speaker verification systems against spoofing attacks. It achieves favorable results on synthesized DeepFake attacks with an Equal Error Rate (EER) of 1.18% in the most optimal setting. Logical Access attacks were more challenging to detect at an EER of 12.22%. Compared to the state-of-the-arts in the ASVspoof2021 dataset, the proposed method notably improves EER for DeepFake attacks by an improvement rate of 95.16%.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Framework of an automated speaker verification system.
Figure 2
Figure 2
Overall architecture of the proposed Automated Spoof Detection System.
Figure 3
Figure 3
Flow of extracting Chromagrams, MFCCs, and CQCCs.
Algorithm 1
Algorithm 1
Bag of Auditory Bites
Figure 4
Figure 4
Waveforms and the several descriptors extracted from the bonafide and spoofed audio of speaker LA_0046.
Figure 5
Figure 5
Accuracy and Loss curves corresponding to the training of Bi-LSTM on Chromagram, MFCC and CQCC timestamps.
Figure 6
Figure 6
Predicted waveforms by the Sequential Bi-LSTM model.
Figure 7
Figure 7
Equal error rate plots of extreme learning machine in the optimal settings for LA and DF tasks.
Figure 8
Figure 8
Confusion matrices plotted for the ELM trained and tested to detect logical and deepfake attacks.
Figure 9
Figure 9
t-SNE and mean-variance visualizations of converged BoAB features in LA and DF datasets.

References

    1. Uludag, U. et al. Biometric cryptosystems: Issues and challenges. Proc. IEEE92, 948–960 (2004).10.1109/JPROC.2004.827372 - DOI
    1. Wells, A. & Usman, A. B. Privacy and biometrics for smart healthcare systems: Attacks, and techniques. Inf. Secur. J. Glob. Perspective. 33,1–25 (2023).
    1. Panda, R., Malheiro, R. M. & Paiva, R. P. Audio features for music emotion recognition: A survey. IEEE Trans. Affect. Comput.14, 68–88 (2020).10.1109/TAFFC.2020.3032373 - DOI
    1. Goode, Alan. Biometrics for banking: Best practices and barriers to adoption. Biom. Technol. Today2018(10), 5–7. 10.1016/S0969-4765(18)30156-5 (2018).10.1016/S0969-4765(18)30156-5 - DOI
    1. Herzberg, A. Payments and banking with mobile personal devices. Commun. ACM46(5), 53–58. 10.1145/769800.769801 (2003).10.1145/769800.769801 - DOI

LinkOut - more resources