Review

. 2022 Jan 21;22(3):819.

doi: 10.3390/s22030819.

Machine Learning for Multimedia Communications

Nikolaos Thomos¹, Thomas Maugey², Laura Toni³

Affiliations

¹ School of Computer Science and Electronic Engineering, University of Essex, Colchester CO4 3SQ, UK.
² Inria, 35042 Rennes, France.
³ Department of Electrical & Electrical Engineering, University College London (UCL), London WC1E 6AE, UK.

PMID: 35161566
PMCID: PMC8840624
DOI: 10.3390/s22030819

Review

Machine Learning for Multimedia Communications

Nikolaos Thomos et al. Sensors (Basel). 2022.

. 2022 Jan 21;22(3):819.

doi: 10.3390/s22030819.

Authors

Nikolaos Thomos¹, Thomas Maugey², Laura Toni³

Affiliations

¹ School of Computer Science and Electronic Engineering, University of Essex, Colchester CO4 3SQ, UK.
² Inria, 35042 Rennes, France.
³ Department of Electrical & Electrical Engineering, University College London (UCL), London WC1E 6AE, UK.

PMID: 35161566
PMCID: PMC8840624
DOI: 10.3390/s22030819

Abstract

Machine learning is revolutionizing the way multimedia information is processed and transmitted to users. After intensive and powerful training, some impressive efficiency/accuracy improvements have been made all over the transmission pipeline. For example, the high model capacity of the learning-based architectures enables us to accurately model the image and video behavior such that tremendous compression gains can be achieved. Similarly, error concealment, streaming strategy or even user perception modeling have widely benefited from the recent learning-oriented developments. However, learning-based algorithms often imply drastic changes to the way data are represented or consumed, meaning that the overall pipeline can be affected even though a subpart of it is optimized. In this paper, we review the recent major advances that have been proposed all across the transmission chain, and we discuss their potential impact and the research challenges that they raise.

Keywords: QoE assessment; caching; channel coding; content consumption; error concealment; image coding; machine learning; multimedia communications; video coding; video streaming.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Figure 1**
Multimedia transmission pipeline.

**Figure 3**
Usual architecture of end-to-end learning-based compression algorithms. The encoding and decoding functions f and g enables us to project the multimedia signal into a latent space of reduced dimension. The quantization Q and the entropy coding aims at describing the latent vector into a compact binary stream.

**Figure 4**
Adaptive video streaming over HTTP. After encoding the video into multiple representations (e.g., resolutions, qualities, etc.), it is stored in a server from where it can be delivered to the users. Before watching the video, users first obtain the MDP file, which contains information where the video is stored (typically, the video is split into chunks of 2–5 s), it acquires the video. The representation of the video displayed to the users is decided by the users and depends on the encountered channel conditions and other quality factors. The adaptation logic can be either based on control theory approaches or machine learning. The latter permits the consideration of multiple quality factors and forecasting future changes in the network conditions.

**Figure 5**
In many applications, including AR/VR/XR and 360-degree video, users are interested in watching a part of a scene (non-shaded area) known as viewport and can freely navigate in the scene enjoying an up to a 6 degree of freedom (DoF) experience.

**Figure 6**
Visualization of different adaptive streaming strategies for interactive systems. In the viewport-independent case, the entire panorama is encoded at multiple quality levels and resolutions and fully sent to final users. The other two approaches are viewport-dependent ones, in which either areas of interest in the panorama are encoded at high quality (viewport-based projection) or the panorama is encoded into multiple tiles and the tiles covering the area to be visualized will be downloaded at higher quality (viewport-based tiling).

**Figure 7**
Intelligent caching network. Machine learning is used for content prediction and deciding which content to cache in each SBS and from where to deliver it to the users. Decisions can be centralized at the MBS or distributed at the SBS or follow federated learning concepts.

**Figure 8**
Compression aim changes since the decoded images or videos can be used to perform automatized tasks (e.g., identify dangerous situations in vehicular networks, recognize persons, etc.) apart from being watched by humans. This necessitates the consideration of different metrics for defining the loss functions of the neural network architectures.

See this image and copyright information in PMC

References

1. Kountouris M., Pappas N. Semantics-Empowered Communication for Networked Intelligent Systems. IEEE Commun. Mag. 2021;59:96–102. doi: 10.1109/MCOM.001.2000604. - DOI
1. AI, J. ISO/IEC JTC 1/SC29/WG1 N91014, REQ “JPEG AI Use Cases and Requirements”. 2021.
1. MPEG Activity: Video Coding for Machines. [(accessed on 7 January 2021)]. Available online: https://mpeg.chiariglione.org/standards/exploration/video-coding-machines.
1. Moving Picture, Audio and Data Coding by Artificial Intelligence. [(accessed on 7 January 2021)]. Available online: https://mpai.community/
1. Hussain A.J., Al-Fayadh A., Radi N. Image compression techniques: A survey in lossless and lossy algorithms. Neurocomputing. 2018;300:44–69. doi: 10.1016/j.neucom.2018.02.094. - DOI

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Machine Learning for Multimedia Communications

Affiliations

Machine Learning for Multimedia Communications

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources