Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 Jan 21;22(3):819.
doi: 10.3390/s22030819.

Machine Learning for Multimedia Communications

Affiliations
Review

Machine Learning for Multimedia Communications

Nikolaos Thomos et al. Sensors (Basel). .

Abstract

Machine learning is revolutionizing the way multimedia information is processed and transmitted to users. After intensive and powerful training, some impressive efficiency/accuracy improvements have been made all over the transmission pipeline. For example, the high model capacity of the learning-based architectures enables us to accurately model the image and video behavior such that tremendous compression gains can be achieved. Similarly, error concealment, streaming strategy or even user perception modeling have widely benefited from the recent learning-oriented developments. However, learning-based algorithms often imply drastic changes to the way data are represented or consumed, meaning that the overall pipeline can be affected even though a subpart of it is optimized. In this paper, we review the recent major advances that have been proposed all across the transmission chain, and we discuss their potential impact and the research challenges that they raise.

Keywords: QoE assessment; caching; channel coding; content consumption; error concealment; image coding; machine learning; multimedia communications; video coding; video streaming.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Multimedia transmission pipeline.
Figure 2
Figure 2
Communication ecosystem.
Figure 3
Figure 3
Usual architecture of end-to-end learning-based compression algorithms. The encoding and decoding functions f and g enables us to project the multimedia signal into a latent space of reduced dimension. The quantization Q and the entropy coding aims at describing the latent vector into a compact binary stream.
Figure 4
Figure 4
Adaptive video streaming over HTTP. After encoding the video into multiple representations (e.g., resolutions, qualities, etc.), it is stored in a server from where it can be delivered to the users. Before watching the video, users first obtain the MDP file, which contains information where the video is stored (typically, the video is split into chunks of 2–5 s), it acquires the video. The representation of the video displayed to the users is decided by the users and depends on the encountered channel conditions and other quality factors. The adaptation logic can be either based on control theory approaches or machine learning. The latter permits the consideration of multiple quality factors and forecasting future changes in the network conditions.
Figure 5
Figure 5
In many applications, including AR/VR/XR and 360-degree video, users are interested in watching a part of a scene (non-shaded area) known as viewport and can freely navigate in the scene enjoying an up to a 6 degree of freedom (DoF) experience.
Figure 6
Figure 6
Visualization of different adaptive streaming strategies for interactive systems. In the viewport-independent case, the entire panorama is encoded at multiple quality levels and resolutions and fully sent to final users. The other two approaches are viewport-dependent ones, in which either areas of interest in the panorama are encoded at high quality (viewport-based projection) or the panorama is encoded into multiple tiles and the tiles covering the area to be visualized will be downloaded at higher quality (viewport-based tiling).
Figure 7
Figure 7
Intelligent caching network. Machine learning is used for content prediction and deciding which content to cache in each SBS and from where to deliver it to the users. Decisions can be centralized at the MBS or distributed at the SBS or follow federated learning concepts.
Figure 8
Figure 8
Compression aim changes since the decoded images or videos can be used to perform automatized tasks (e.g., identify dangerous situations in vehicular networks, recognize persons, etc.) apart from being watched by humans. This necessitates the consideration of different metrics for defining the loss functions of the neural network architectures.

Similar articles

References

    1. Kountouris M., Pappas N. Semantics-Empowered Communication for Networked Intelligent Systems. IEEE Commun. Mag. 2021;59:96–102. doi: 10.1109/MCOM.001.2000604. - DOI
    1. AI, J. ISO/IEC JTC 1/SC29/WG1 N91014, REQ “JPEG AI Use Cases and Requirements”. 2021.
    1. MPEG Activity: Video Coding for Machines. [(accessed on 7 January 2021)]. Available online: https://mpeg.chiariglione.org/standards/exploration/video-coding-machines.
    1. Moving Picture, Audio and Data Coding by Artificial Intelligence. [(accessed on 7 January 2021)]. Available online: https://mpai.community/
    1. Hussain A.J., Al-Fayadh A., Radi N. Image compression techniques: A survey in lossless and lossy algorithms. Neurocomputing. 2018;300:44–69. doi: 10.1016/j.neucom.2018.02.094. - DOI

LinkOut - more resources