Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec 9:58:111201.
doi: 10.1016/j.dib.2024.111201. eCollection 2025 Feb.

Ghadeer-speech-crowd-corpus: Speech dataset

Affiliations

Ghadeer-speech-crowd-corpus: Speech dataset

Ghadeer Qasim Ali et al. Data Brief. .

Abstract

The availability of raw data is a considerable challenge across most branches of science. In the absence of data, neither experiments can be conducted nor development can be undertaken. Despite their importance, raw data are still lacking across many scientific fields. A literature survey conducted at the beginning of our study indicated a significant lack of Arabic speech datasets. Therefore, this study aims to address this problem by proposing a new Arabic and English dataset called Ghadeer-Speech-Crowd-Corpus. This dataset was designed to target more than one branch of speech-processing applications, such as crowd speaker identification, speech synthesis (text-to-speech), and speech recognition (speech-to-text). Speech samples were recorded over three months from 210 Iraqi Arab citizens living in different parts of Iraq and included more than one accent. The proposed dataset was fully balanced with respect to sex and recordings (same number of Arabic and English recordings). Additionally, it is a mono dataset and contains 15,626 audio samples recorded at a sampling rate of 44,100 Hz, 16-bit depth, and bit rate of 705.6 kb/s. The recordings were conducted at the Academy for Media Training of the College of Media, University of Baghdad.

Keywords: Arabic phrase; English phrase; Low-resource languages; Speech recognition.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
The recording device used Mixer, Microphone.
Fig. 2
Fig. 2
Architecture of the dataset.
Fig. 3
Fig. 3
Coding system design. (a) solo speaker, (b) 2 crowd speakers, (c) 3 crowd speakers, (d) 4 crowd speakers, (e) 5 crowd speakers.

References

    1. Abdulmohsin H.A., Stephan J.J., Al-Khateeb B., Hasan S.S. In Proceedings of International Conference on Computing and Communication Networks: ICCCN 2021. Springer; 2022. Speech age estimation using a ranking convolutional neural network; pp. 123–130.
    1. Abdulmohsin H.A., Al-Khateeb B., Hasan S.S. In Proceedings of International Conference on Computing and Communication Networks: ICCCN 2021. Springer; 2022. Speech gender recognition using a multilayer feature extraction method; pp. 113–122.
    1. Panayotov V., Chen G., Povey D., Khudanpur S. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing ICASSP. Vol. 2015. 2015. Librispeech: an ASR corpus based on public domain audio books; pp. 5206–5210. - DOI
    1. W.M. Fisher, J.G. Fiscus, and D.S. Pallett, “Acoustic-Phonetic Continuous Speech Corpus,” no. 1992, 2015.
    1. D.B. Paul and J.M. Baker, “The Design for the Wall Street Journal based CSR Corpus,” 1994.

LinkOut - more resources