A dataset for voice-based human identity recognition

Baha' A Alsaify¹, Hadeel S Abu Arja¹, Baskal Y Maayah¹, Masa M Al-Taweel¹

Affiliations

PMID: 35356317
PMCID: PMC8958529
DOI: 10.1016/j.dib.2022.108070

A dataset for voice-based human identity recognition

Baha' A Alsaify et al. Data Brief. 2022.

. 2022 Mar 18:42:108070.

doi: 10.1016/j.dib.2022.108070. eCollection 2022 Jun.

Authors

Baha' A Alsaify¹, Hadeel S Abu Arja¹, Baskal Y Maayah¹, Masa M Al-Taweel¹

Affiliation

¹ Department of Network Engineering and Security, Jordan University of Science and Technology, P.O. Box 3030, Irbid 22110, Jordan.

PMID: 35356317
PMCID: PMC8958529
DOI: 10.1016/j.dib.2022.108070

Abstract

This paper introduces a new English speech dataset suitable for training and evaluating speaker recognition systems. Samples were obtained from non-native English speakers from the Arab region over the course of two months. The dataset was divided into two sub-datasets. Ten samples were collected from each speaker for each sub-dataset. The first sub-dataset contains samples of speakers repeating the phrase "Machine learning 1, 2, 3, 4, 5, 6, 7, 8, 9, 10". The second sub-dataset contains samples for the same speakers speaking randomly for five to ten seconds for each sample. The dataset consists of 150 speakers with a total of 3,000 data samples and about six hours of speech.

Keywords: Applied machine learning; Audio dataset; Different phrase; FLAC; Same phrase; Voice recognition.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships which have or could be perceived to have influenced the work reported in this article.

Figures

Fig 1: — **Fig. 1**
Architecture of the dataset.

Fig 2: — **Fig. 2**
Different speakers saying same phrase in the samePhrase sub-dataset.

Fig 3: — **Fig. 3**
Different speakers saying different phrase in the differentPhrase sub-dataset.

Fig 4: — **Fig. 4**
Same speaker saying same phrase in the samePhrase sub-dataset.

Fig 5: — **Fig. 5**
Same speaker saying different phrase in the differentPhrase sub-dataset.

See this image and copyright information in PMC

References

1. Panayotov V., Chen G., Povey D., Khudanpur S. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2015. Librispeech: an ASR corpus based on public domain audio books; pp. 5206–5210. - DOI
1. Y. Shafranovich, “Common format and MIME type for CSV files”, RFC 4180, doi:10.17487/RFC4180. - DOI
1. I. Goncalves; S. Pfeiffer; C. Montgomery, “Ogg media types”, RFC 5334, doi:10.17487/RFC5334. - DOI
1. 3RD generation partnership project 2 “3GPP2”. https://www.3gpp2.org/Public_html/Specs/C.S0050-B_v1.0_070521.pdf (accessed December 20 2021).
1. Muin F., Gunawan T., Kartiwi M., Elsheikh E., Elsheikh M.A. AIP Conference Proceedings. Vol. 1883. AIP Publishing LLC; 2017. A review of lossless audio compression standards and algorithms.

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A dataset for voice-based human identity recognition

Affiliation

A dataset for voice-based human identity recognition

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources