New in this (large) release:
- We are now using Polly for something! N-dimensional tiling (where N > 1) can be performed on loops, where the tiling sizes are chosen by the tuner. This requires
POLLY_KNOBS
to beON
, and the use of Kruse's out-of-tree version of Polly. Instructions are in theREADME
. - ATDriver now uses an experimentation-rate limiter to better exploit the results of the tuner. [1]
- The Bayes tuner is no longer (badly) over-fitting to its training examples, making it a lot smarter. [2]
- Reduced the number of individual loop settings that are picked when randomly producing a new config. [3]
- The Annealing tuner was slightly improved by choosing a better escape velocity factor.
- Compilation jobs that seem to be non-terminating (>90sec) are now detected and we crash the program in this case.
- The tuner can now play with enabling Interprocedural Register Allocation, which is not on by default in LLVM.
- The benchmark suite has been greatly improved. [4]
Footnotes:
- Every Nth
reoptimize
request will actually obtain a new version of the function. Other times, we return the best-known version. - We hold-out a small number of examples to use for validation during training, and we stop training when the prediction error is no longer decreasing.
- We use a biased coin-flip to filter these out.
- We build multiple variants with different AOT optimization settings. We also can now view the performance with and without JIT overheads so that we can more directly compare the tuning algorithms themselves.