BinBench: a benchmark for x64 portable operating system interface binary function representations

Francesca Console¹, Giuseppe D'Aquanno¹, Giuseppe Antonio Di Luna¹, Leonardo Querzoni¹

Affiliations

PMID: 37346713
PMCID: PMC10280411
DOI: 10.7717/peerj-cs.1286

BinBench: a benchmark for x64 portable operating system interface binary function representations

Francesca Console et al. PeerJ Comput Sci. 2023.

. 2023 Jun 1:9:e1286.

doi: 10.7717/peerj-cs.1286. eCollection 2023.

Authors

Francesca Console¹, Giuseppe D'Aquanno¹, Giuseppe Antonio Di Luna¹, Leonardo Querzoni¹

Affiliation

¹ Department of Computer, Control and Management Engineering, University of Roma "La Sapienza", Rome, Italy.

PMID: 37346713
PMCID: PMC10280411
DOI: 10.7717/peerj-cs.1286

Abstract

In this article we propose the first multi-task benchmark for evaluating the performances of machine learning models that work on low level assembly functions. While the use of multi-task benchmark is a standard in the natural language processing (NLP) field, such practice is unknown in the field of assembly language processing. However, in the latest years there has been a strong push in the use of deep neural networks architectures borrowed from NLP to solve problems on assembly code. A first advantage of having a standard benchmark is the one of making different works comparable without effort of reproducing third part solutions. The second advantage is the one of being able to test the generality of a machine learning model on several tasks. For these reasons, we propose BinBench, a benchmark for binary function models. The benchmark includes various binary analysis tasks, as well as a dataset of binary functions on which tasks should be solved. The dataset is publicly available and it has been evaluated using baseline models.

Keywords: Assembly language; Benchmark; Binary functions; Binary functions representation; Binary similarity; Compiler provenance; Dataset; Neural networks.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

**Figure 1. Examples of binary similarity.**
(A) Similar binary functions are generated by compiling the same source code. (B) Dissimilar binary functions are obtained from the compilation of different source code.

**Figure 2. Folder structure of a package in the labeled dataset.**
(A) Folder in the JSON component. (B) Folder in the binary component.

See this image and copyright information in PMC

References

1. Allamanis M, Barr ET, Devanbu P, Sutton C. A survey of machine learning for big code and naturalness. ACM Computing Surveys. 2018;51(4):1–37. doi: 10.1145/3212695. - DOI
1. Alrabaee S, Shirani P, Wang L, Debbabi M. SIGMA: a semantic integrated graph matching approach for identifying reused functions in binary code. Digital Investigation. 2015;12(1):S61–S71. doi: 10.1016/j.diin.2015.01.011. - DOI
1. Artuso F, Luna GAD, Massarelli L, Querzoni L. In nomine function: naming functions in stripped binaries with neural networks. ArXiv. 2021.
1. Bahdanau D, Cho K, Bengio Y. 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings. 2015. Neural machine translation by jointly learning to align and translate.
1. Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler D, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D. Language models are few-shot learners. In: Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, editors. Advances in Neural Information Processing Systems. Vol. 33. Red Hook: Curran Associates, Inc; 2020. pp. 1877–1901.

Associated data

figshare/10.6084/m9.figshare.21546111.v2

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

BinBench: a benchmark for x64 portable operating system interface binary function representations

Affiliation

BinBench: a benchmark for x64 portable operating system interface binary function representations

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Associated data

LinkOut - more resources

Full Text Sources