-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement the AVX-512 versions of the following functions #454
Comments
I try to use avx2 to optimize union_vector16. but there is no improvement. Actually, i run the test comparing union_vector16 to union_uint16. the result is the same. Have you noticed this? |
@huihan365 In our production environment, they have obvious performance gap. Are you sure you've tested it properly? |
I need to write a better benchmark harness for this project. Give me a bit of time. |
@CharlesChen888 I use the latest version and run the benchmark.(icelake Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz) |
@huihan365 OK... I just did a benchmark on my mac (i7-9750H), and the results are similar. We need to test it with more different datasets. AVX2 disabled: AVX2 enabled: |
The benchmarks that come with the library were always so-so. I wrote most of them very quickly and we did not rely on them to optimize the library. We should scrap them and do better. I will prepare something, a first step forward, soon... |
Note that my PR will come 'soon'. (Not weeks or months.) |
@huihan365 @CharlesChen888 Have a look at PR I will merge it as soon as the tests are green. This gives you a much better way to benchmark the code. If you have privileged access to the system (via sudo) you will get performance counters. |
Ok. I merged the PR and we now have sensible benchmarks with Google Benchmarks. The instructions are... Running microbenchmarksWe have microbenchmarks constructed with the Google Benchmarks.
By default, the benchmark tools picks one data set (e.g.,
You may disable some functionality for the purpose of benchmarking. For example, you could
|
@huihan365 One can do some profiling using the new microbenchmarks, to identify the functions worth optimizing.
|
Note that you can disable AVX (and AVX-512) entirely if you want...
|
I have committed a PR that vectorizes array_container_to_uint32_array. |
My commit is in the main branch. The difference between a vectorized and a non-vectorized approach is rather obvious:
Roughly a 2x gain. The AVX-512 routine is not faster in this instance, but SIMD definitively helps. |
Starting with version 1.0.0 of the library we have AVX-512 routines, but we are still missing many optimizations.
The following functions related to array containers may be upgraded to versions using AVX-512 instructions:
This is an open issue and we are inviting pull requests.
cc @huihan365
The text was updated successfully, but these errors were encountered: