Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2019 Jun 25:2019:4368036.
doi: 10.1155/2019/4368036. eCollection 2019.

Speech Technology Progress Based on New Machine Learning Paradigm

Affiliations
Review

Speech Technology Progress Based on New Machine Learning Paradigm

Vlado Delić et al. Comput Intell Neurosci. .

Abstract

Speech technologies have been developed for decades as a typical signal processing area, while the last decade has brought a huge progress based on new machine learning paradigms. Owing not only to their intrinsic complexity but also to their relation with cognitive sciences, speech technologies are now viewed as a prime example of interdisciplinary knowledge area. This review article on speech signal analysis and processing, corresponding machine learning algorithms, and applied computational intelligence aims to give an insight into several fields, covering speech production and auditory perception, cognitive aspects of speech communication and language understanding, both speech recognition and text-to-speech synthesis in more details, and consequently the main directions in development of spoken dialogue systems. Additionally, the article discusses the concepts and recent advances in speech signal compression, coding, and transmission, including cognitive speech coding. To conclude, the main intention of this article is to highlight recent achievements and challenges based on new machine learning paradigms that, over the last decade, had an immense impact in the field of speech signal processing.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Interdisciplinary nature of speech technologies, i.e., spoken language processing (adopted from [2]).
Figure 2
Figure 2
Unified framework that encompasses speech signal processing fields in the scope of the article.
Figure 3
Figure 3
Block diagram of speech production and speech perception and corresponding processes performed by machines carrying out text-to-speech synthesis (TTS) and automatic speech recognition (ASR).
Figure 4
Figure 4
Components of a human-machine speech dialogue system.
Figure 5
Figure 5
Speech signal quality according to MOS versus bit rate for various speech signal coding techniques.
Figure 6
Figure 6
Forward adaptive PCM: (a) encoder; (b) decoder.
Figure 7
Figure 7
One of the realizations of backward adaptive PCM with one codeword memory: (a) encoder; (b) decoder.
Figure 8
Figure 8
Dual mode quantization scheme: (a) encoder; (b) decoder.
Figure 9
Figure 9
DPCM: (a) encoder; (b) decoder.

References

    1. Kuhn T. S. The Structure of Scientific Revolutions-50th Anniversary Edition. 4th. Vol. 3. Chicago, IL, USA: The University of Chicago Press; 2012.
    1. Moore R. K. Cognitive informatics: the future of spoken language processing?. Proceedings of the 10th International Conference on Speech and Computer (SPECOM); October 2005; Patras, Greece.
    1. Paul J. D. Re-creating the sigsaly quantizer: this 1943 analog-to-digital converter gave the allies an unbreakable scrambler-(resources) IEEE Spectrum. 2019;56(2):16–17. doi: 10.1109/mspec.2019.8635806. - DOI
    1. Jayant N. S., Noll P. Digital coding of waveforms. Principles and applications to speech and video. Signal Processing. 1985;9(2):139–140. doi: 10.1016/0165-1684(85)90053-2. - DOI
    1. Chu W. C. Speech Coding Algorithms: Foundation and Evolution of Standardized Coders. Hoboken, NJ, USA: John Wiley & Sons; 2003.

MeSH terms

LinkOut - more resources