Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025;2025(1):38.
doi: 10.1186/s13638-025-02469-6. Epub 2025 May 30.

Key metrics for monitoring performance variability in edge computing applications

Affiliations

Key metrics for monitoring performance variability in edge computing applications

Panagiotis Giannakopoulos et al. EURASIP J Wirel Commun Netw. 2025.

Abstract

Edge computing is an emerging approach that enables applications to run closer to users, accommodating their specific execution time requirements. Edge computing systems typically consist of heterogeneous processing and networking components, resulting in inconsistent task performance. To improve the consistency of edge computing applications, this study presents a method to identify the factors that affect variability in task execution time. We deploy a set of single-particle analysis algorithms, designed for an electron microscopy use case, running on a Kubernetes cluster monitored by Prometheus. This specific usecase was chosen because it encompasses a diverse set of time-sensitive and privacy-sensitive applications, with a wide range of resource requirements. Our experiments revealed a significant increase in the variability of round-trip time when tasks share resources. The proposed approach identifies the most relevant monitoring metrics from a larger set of collected ones (provided by Prometheus), with correlations up to 87%. This process reduces the number of metrics to 90, achieving a reduction of 80%. As a result, the overhead of the monitoring system is decreased, and the use of these metrics for further processing, such as predictive modeling and scheduling, is simplified. These selected metrics not only help to understand the causes of performance variability, but also possess predictive value, enabling more efficient scheduling. The prediction power of these metrics is shown using SHapley Additive exPlanations analysis.

Keywords: Edge computing; Kubernetes; Monitoring metrics; Performance variability; Prometheus.

PubMed Disclaimer

Conflict of interest statement

Competing interestsThe authors declare no Conflict of interest.

Figures

Fig. 1
Fig. 1
Stages of the proposed methodology to calculate the correlation between monitoring metrics and application performance
Fig. 2
Fig. 2
Structure of the used infrastructure. The blue lines indicate the physical connectivity of the servers through a 1 Gbps Ethernet link
Fig. 3
Fig. 3
Correlation of the top extracted metrics for each application when co-located on Worker-1
Fig. 4
Fig. 4
Correlation change between different application instances running on Worker-1. The y-axis of each subplot represents the metric-IDs of the corresponding application instance shown in Fig. 3
Fig. 5
Fig. 5
Correlation change of metrics extracted for a particular application from Worker-1 compared to Worker-2 and Worker-3. The y-axis of each subplot represents the metric-IDs of the corresponding application instance shown in Fig. 3
Fig. 6
Fig. 6
Accuracy of XGBoost for each application on Worker-1 when the used number of monitoring metrics is gradually increased
Fig. 7
Fig. 7
Monitoring metric importance based on SHAP values for an XGBoost model trained on 20 metrics for the Upload application running on Worker-1. The top 10 metrics identified by our methodology are highlighted in bold
Fig. 8
Fig. 8
RMSE of XGBoost when additional tsfresh features are included for the top 10 most correlated metrics of each application running on Worker-1
Fig. 9
Fig. 9
Percentage of common metrics among all applications as the number of monitoring metrics per application increases
Fig. 10
Fig. 10
Number of occurrences of tsfresh features which acquire the highest correlation on all monitoring metrics of Worker-1

References

    1. P. Townend, S. Clement, D. Burdett, R. Yang, J. Shaw, B. Slater, J. Xu, Invited paper: improving data center efficiency through holistic scheduling in kubernetes. In: 2019 IEEE International Conference on Service-Oriented System Engineering (SOSE), pp 156–15610. IEEE. 10.1109/SOSE.2019.00030
    1. Y. Fu, S. Zhang, J. Terrero, Y. Mao, G. Liu, S. Li, D. Tao, Progress-based container scheduling for short-lived applications in a kubernetes cluster. In: 2019 IEEE International Conference on Big Data (Big Data), pp 278–287. IEEE. 10.1109/BigData47090.2019.9006427
    1. A. Kuriata, R.G. Illikkal, Predictable performance for QoS-sensitive, scalable, multi-tenant function-as-a-service deployments. In: Paasivaara, M., Kruchten, P. (eds.) Agile Processes in Software Engineering and Extreme Programming - Workshops vol. 396, pp. 133–140. Springer. 10.1007/978-3-030-58858-8_14 . Series Title: Lecture Notes in Business Information Processing
    1. A. Maricq, D. Duplyakin, R. Stutsman, I. Jimenez, R. Ricci, C. Maltzahn, Taming performance variability, 18
    1. D. Duplyakin, A. Uta, A. Maricq, R. Ricci, On studying CPU performance of CloudLab hardware. In: 2019 IEEE 27th International Conference on Network Protocols (ICNP), pp 1–2. IEEE.10.1109/ICNP.2019.8888128

LinkOut - more resources