A Machine Learning Framework for Code Optimisation

Traditional compilers take programs written in a high-level programming language and transform them into semantically equivalent machine code. Along this pipeline, the optimisation stage of the compiler carries out transformations on the code, which try to improve on the speed and size of the resultant target program. These optimisation passes are carried out iteratively, with each pass searching for potential improvements of the target program. Programmers can specify which of these optimisations to carry out during compilation of their source code by passing a sequence of flags to the compiler.

This project investigates whether machine learning techniques, specifically those based on an evolutionary approach, are able to improve on the standard optimisation sequences (e.g. -O3) available with popular compilers such as GCC and LLVM. This is done through the design and implementation of a Genetic Algorithm Framework (GAF) which searches for specific flag sequences that optimise for compilation time, executable file size, execution time, or a combination of these, as shown in Figure 1. The GAF is controlled through multiple parameters such as; population size, selection function and adjustment of

sequence length, which affect the navigation through the search space. Searching for these sequences is time- consuming since this requires compiling and executing the given program using a large number of sequences, as mentioned by Cooper et al. [1], Ballal et al. [2] and Kukunas et al. [3]. Optimal flag sequences are then tested on other programs in order to verify whether the identified sequence actually improves the target programs produced.

GAF was tested on real-life applications while comparing the results from varying individual parameters, a combination of parameters based on the individual parameter performance, and applying identified optimisation sequences on unseen programs, as shown in Figure 2. The framework achieved 13.98% better performance on average than the -O3 flag sequence offered by the LLVM compiler when applying one of the identified optimisation sequences. Three sequences were identified which optimise compilation time, executable file size and execution time individually, resulting in a 10.18%, 2.58% and a 28.97% average improvement from the -O3 sequence provided by the LLVM compiler.

Figure 1 – Framework training
Figure 2 – Results obtained when optimising for build time, file size and execution time

References

[1]         P. A. Ballal, H. Sarojadevi and P. S. Harsha, “Compiler optimization: A genetic algorithm approach,” International Journal of Computer Applications, vol. 112, 2015.

[2]         K. D. Cooper, P. J. Schielke and D. Subramanian, “Optimizing for Reduced Code Space Using Genetic Algorithms,” in Proceedings of the ACM SIGPLAN 1999 Workshop on Languages, Compilers, and Tools for Embedded Systems, New York, NY, USA, 1999.

[3]         K. Hoste and L. Eeckhout, “Cole: Compiler Optimization Level Exploration,” in Proceedings of the 6th Annual IEEE/ACM International Symposium on Code Generation and Optimization, New York, NY, USA, 2008.

Student: Brandon Abela
Supervisor: Dr Sandro Spina
Co-supervisor: Dr Joshua Ellul
Course: B.Sc. (Hons.) Computing Science