Anomaly Detection for Individual Sequences with Applications in Identifying Malicious Tools
- PMID: 33286421
- PMCID: PMC7517183
- DOI: 10.3390/e22060649
Anomaly Detection for Individual Sequences with Applications in Identifying Malicious Tools
Abstract
Anomaly detection refers to the problem of identifying abnormal behaviour within a set of measurements. In many cases, one has some statistical model for normal data, and wishes to identify whether new data fit the model or not. However, in others, while there are normal data to learn from, there is no statistical model for this data, and there is no structured parameter set to estimate. Thus, one is forced to assume an individual sequences setup, where there is no given model or any guarantee that such a model exists. In this work, we propose a universal anomaly detection algorithm for one-dimensional time series that is able to learn the normal behaviour of systems and alert for abnormalities, without assuming anything on the normal data, or anything on the anomalies. The suggested method utilizes new information measures that were derived from the Lempel-Ziv (LZ) compression algorithm in order to optimally and efficiently learn the normal behaviour (during learning), and then estimate the likelihood of new data (during operation) and classify it accordingly. We apply the algorithm to key problems in computer security, as well as a benchmark anomaly detection data set, all using simple, single-feature time-indexed data. The first is detecting Botnets Command and Control (C&C) channels without deep inspection. We then apply it to the problems of malicious tools detection via system calls monitoring and data leakage identification.We conclude with the New York City (NYC) taxi data. Finally, while using information theoretic tools, we show that an attacker's attempt to maliciously fool the detection system by trying to generate normal data is bound to fail, either due to a high probability of error or because of the need for huge amounts of resources.
Keywords: NYC taxi data; anomaly detection; botnets; command and control channels; computer security; individual sequences; learning; one-dimensional time series; probability assignment; statistical model; universal compression.
Conflict of interest statement
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.
Figures








References
-
- Strayer W.T., Lapsely D., Walsh R., Livadas C. Botnet Detection. Springer; Boston, MA, USA: 2008. Botnet detection based on network behavior; pp. 1–24.
-
- Gu G., Zhang J., Lee W. BotSniffer: Detecting botnet command and control channels in network traffic; Proceedings of the 15th Annual Network and Distributed System Security Symposium; San Diego, CA, USA. 10–13 February 2008.
-
- Chang S., Daniels T.E. P2P botnet detection using behavior clustering & statistical tests; Proceedings of the 2nd ACM Workshop on Security and Artificial Intelligence; Chicago, IL, USA. 9 November 2009; New York, NY, USA: ACM; 2009. pp. 23–30.
-
- Noh S.K., Oh J.H., Lee J.S., Noh B.N., Jeong H.C. Detecting P2P botnets using a multi-phased flow model; Proceedings of the 2009 Third International Conference on Digital Society, ICDS’09; Cancun, Mexico. 1–7 February 2009; Piscataway, NJ, USA: IEEE; 2009. pp. 247–253.
-
- Francois J., Wang S., Bronzi W., State R., Engel T. BotCloud: Detecting botnets using MapReduce; Proceedings of the 2011 IEEE International Workshop on Information Forensics and Security; Iguacu Falls, Brazil. 29 November–2 December 2011; Piscataway, NJ, USA: IEEE; 2011. pp. 1–6.
Grants and funding
LinkOut - more resources
Full Text Sources