Discovering anomalies in big data: a review focused on the application of metaheuristics and machine learning techniques

Claudia Cavallaro¹, Vincenzo Cutello¹, Mario Pavone¹, Francesco Zito¹

Affiliations

PMID: 37663272
PMCID: PMC10470118
DOI: 10.3389/fdata.2023.1179625

Review

Discovering anomalies in big data: a review focused on the application of metaheuristics and machine learning techniques

Claudia Cavallaro et al. Front Big Data. 2023.

. 2023 Aug 17:6:1179625.

doi: 10.3389/fdata.2023.1179625. eCollection 2023.

Authors

Claudia Cavallaro¹, Vincenzo Cutello¹, Mario Pavone¹, Francesco Zito¹

Affiliation

¹ Department of Mathematics and Computer Science, University of Catania, Catania, Italy.

PMID: 37663272
PMCID: PMC10470118
DOI: 10.3389/fdata.2023.1179625

Abstract

With the increase in available data from computer systems and their security threats, interest in anomaly detection has increased as well in recent years. The need to diagnose faults and cyberattacks has also focused scientific research on the automated classification of outliers in big data, as manual labeling is difficult in practice due to their huge volumes. The results obtained from data analysis can be used to generate alarms that anticipate anomalies and thus prevent system failures and attacks. Therefore, anomaly detection has the purpose of reducing maintenance costs as well as making decisions based on reports. During the last decade, the approaches proposed in the literature to classify unknown anomalies in log analysis, process analysis, and time series have been mainly based on machine learning and deep learning techniques. In this study, we provide an overview of current state-of-the-art methodologies, highlighting their advantages and disadvantages and the new challenges. In particular, we will see that there is no absolute best method, i.e., for any given dataset a different method may achieve the best result. Finally, we describe how the use of metaheuristics within machine learning algorithms makes it possible to have more robust and efficient tools.

Keywords: anomaly detection; classification; deep learning; fault detection; machine learning; metaheuristics; recurrent neural network; security threats.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**Figure 1**
Structure of a recurrent layer.

**Figure 2**
Architecture of the recurrent neural network considered.

**Figure 3**
**(A)** Confusion matrix of the non-optimized NN. **(B)** Confusion matrix of the optimized NN.

See this image and copyright information in PMC

References

1. Agrawal R., Imieliński T., Swami A. (1993). Mining association rules between sets of items in large databases. ACM SIGMOD Rec. 22, 207–216. 10.1145/170036.170072 - DOI
1. Ahmed H., Traore I., Saad S. (2017). Detecting opinion spams and fake news using text classification. Secur. Privacy 1, e9. 10.1002/spy2.9 - DOI
1. Bejoy B., Raju G., Swain D., Acharya B., Hu Y.-C. (2022). A generic cyber immune framework for anomaly detection using artificial immune systems. Appl. Soft Comput. 130:109680. 10.1016/j.asoc.2022.109680 - DOI
1. Bock S., Weiß M. (2019). “A proof of local convergence for the Adam optimizer,” in 2019 International Joint Conference on Neural Networks (IJCNN) (Piscataway, NJ: IEEE; ), 1–8. 10.1109/IJCNN.2019.8852239 - DOI
1. Bottou L. (2012). “Stochastic gradient descent tricks,” in Neural Networks: Tricks of the Trade, 2nd Edn, eds G. Montavon, G. B. Orr, and K. R. Müller (Berlin; Heidelberg: Springer Berlin Heidelberg; ), 421–436. 10.1007/978-3-642-35289-8_25 - DOI

Publication types

Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Discovering anomalies in big data: a review focused on the application of metaheuristics and machine learning techniques

Affiliation

Discovering anomalies in big data: a review focused on the application of metaheuristics and machine learning techniques

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

LinkOut - more resources

Full Text Sources