Releases: kavon/atJIT
Releases · kavon/atJIT
Preview Release 6
New in this (large) release:
- We are now using Polly for something! N-dimensional tiling (where N > 1) can be performed on loops, where the tiling sizes are chosen by the tuner. This requires
POLLY_KNOBS
to beON
, and the use of Kruse's out-of-tree version of Polly. Instructions are in theREADME
. - ATDriver now uses an experimentation-rate limiter to better exploit the results of the tuner. [1]
- The Bayes tuner is no longer (badly) over-fitting to its training examples, making it a lot smarter. [2]
- Reduced the number of individual loop settings that are picked when randomly producing a new config. [3]
- The Annealing tuner was slightly improved by choosing a better escape velocity factor.
- Compilation jobs that seem to be non-terminating (>90sec) are now detected and we crash the program in this case.
- The tuner can now play with enabling Interprocedural Register Allocation, which is not on by default in LLVM.
- The benchmark suite has been greatly improved. [4]
Footnotes:
- Every Nth
reoptimize
request will actually obtain a new version of the function. Other times, we return the best-known version. - We hold-out a small number of examples to use for validation during training, and we stop training when the prediction error is no longer decreasing.
- We use a biased coin-flip to filter these out.
- We build multiple variants with different AOT optimization settings. We also can now view the performance with and without JIT overheads so that we can more directly compare the tuning algorithms themselves.
Preview Release 5
New in this release:
Large performance improvements for calls to reoptimize
!
- Previously, we would immediately try to obtain a newly compiled version of the code once the previous version's measurements were stable, blocking if the compiled code isn't ready. Now, we return the best-seen version if there is a concurrent compile job active but has not completed yet.
- The tuner is now less likely to suggest fully unrolling every loop in the module by biasing the generation of random configs towards smaller unrolling factors. High values often use to overload LLVM, leading to non-terminating compilation jobs.
- Fast Instruction Selection and
-O1
(for both optimization and codegen) are now used when compiling the default/initial configuration. This allows the tuner to respond very quickly to first-timereoptimize
requests. - The IR optimization pipeline is no longer being run twice on each compile, and it is also now being tuned for both code size and aggressiveness.
Preview Release 4
New in this release:
- Compilation jobs can now occur in parallel with the client's code.
reoptimize
requests can spark these optimistic additional jobs (currently up to at most 10) so that futurereoptimize
calls hopefully spend less time waiting on the compiler. - The queue of compilation jobs are also making use of pipeline parallelism, i.e., optimization of some module can occur in parallel with the codegen of another module.
- The codegen optimization level is now a tuned knob, defaulting to
Aggressive
- FastISel is now a tuned knob, defaulting to off.
- The test suite now includes a basic memory leak test.
Preview Release 3
New in this release:
- A new tuner,
AT_Anneal
that uses Simulated Annealing (SA) to optimize the configuration. The particular SA algorithm used is relatively simple: https://www.jstor.org/stable/2246034 - Now that we have SA, we also have the concept of perturbing a configuration, i.e., randomly pick a "nearby" configuration. This has been used to improve
AT_Bayes
so that there is actually an exploration-exploitation trade off. Currently, we try to exploit only the best-seen configuration. - Better debugging output while tuning: now we properly show time spent optimizing & compiling.
Preview Release 2
New features:
- A tuner that is somewhat intelligent. It uses techniques inspired by Bayesian Optimization, and is called
AT_Bayes
. - The ability to specify tunable parameters via the
tuned_parameter::IntRange
class. This allows the user to tell the tuner to optimize the input value according to an inclusive range constraint. Thus, algorithmic selection is now possible via this feature. - The
pct_err
option toreoptimize
to specify how tolerable of noise you would like the tuner to be. This indirectly controls the rate at which the tuner navigates the search space, e.g., less tolerance for error on a noisy system will reduce the pace of experimentation.
Preview Release 1
This release includes:
- a simplistic random tuner that always generates a randomly optimized function once the previously optimized version has received enough performance measurements.
- an up-to-date README