-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a new architecture mode: 'avx512_spr'. #4025
Conversation
Hmm weird the CIs are not running on the PR. Do you mind pushing a new commit and see if the CIs start? |
@mengdilin Pushed a new commit but CI not starting. Is this possibly because I updated .github/workflows/build.yml? |
Yea there is a syntax error in the build file: see https://github.com/facebookresearch/faiss/actions/runs/11804384709 |
.github/workflows/build.yml
Outdated
@@ -67,6 +67,17 @@ jobs: | |||
uses: ./.github/actions/build_cmake | |||
with: | |||
opt_level: avx512 | |||
linux-x86_64-AVX512-cmake: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
linux-x86_64-AVX512-advanced-cmake
@mengdilin CI is picking g++ version 11. But AVX512-FP16 (-mavx512fp16) requires version 12+. |
Ah yea the conda publication CI is using the older compiler version. We should investigate on our side; however, I don't think we want to publish this architecture mode to conda right now, can you omit it? Looks like CI failure is coming from a unit test failure from Line 552 in 0fb56d9
Try commenting out that test and see if anything else fails? |
conda/faiss/build-lib.sh
Outdated
-DFAISS_ENABLE_GPU=OFF \ | ||
-DFAISS_ENABLE_PYTHON=OFF \ | ||
-DBLA_VENDOR=Intel10_64lp \ | ||
-DCMAKE_INSTALL_LIBDIR=lib \ | ||
-DCMAKE_BUILD_TYPE=Release . | ||
|
||
make -C _build -j$(nproc) faiss faiss_avx2 faiss_avx512 | ||
make -C _build -j$(nproc) faiss faiss_avx2 faiss_avx512 faiss_avx512_sr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used for faiss's conda packaging upload. I don't think we want to expose this build mode yet in conda officially. Can you omit this for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
Thanks @mengdilin. @kuarora Could you please review? |
@mulugetam I would use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Back from PTO. LGTM, can you resolve the conflict so I can import the PR. For the unit test, can you relax the threshold instead of commenting it out? If somehow you cannot relax this threshold, you can skip this unittest when in SR mode similar to
faiss/faiss/gpu/test/test_cagra.py
Line 13 in 697b6dd
@unittest.skipIf( |
endif() | ||
if(NOT WIN32) | ||
# Architecture mode to support AVX512 extensions available since Intel(R) Sapphire Rapids. | ||
# Ref: https://networkbuilders.intel.com/solutionslibrary/intel-avx-512-fp16-instruction-set-for-intel-xeon-processor-based-products-technology-guide |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the ref!
tests/test_contrib.py
Outdated
@@ -568,7 +569,7 @@ def test_ivf_train_2level(self): | |||
# normally 47 / 200 differences | |||
ndiff = (Iref != Inew).sum() | |||
self.assertLess(ndiff, 51) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there away to relax the threshold such that this test passes in SR mode as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mengdilin The test case passes in my machine with a ndiff
value of 50
(not sure why it fails here). I will do another push that resolves the conflict. I have also updated the name from avx512-sr
to avx512_spr
to be consistent with what numpy returns.
@mengdilin Only timeout and anaconda_telemetry errors. |
@mulugetam there were some CI issues yesterday that we resolved. Can you rebase to latest commit and retry? The anaconda issues should definitely be resolved on the latest. |
avx512_spr is a mode that supports avx512 features available since Intel(R) Sapphire Rapids. Signed-off-by: Mulugeta Mammo <[email protected]>
Signed-off-by: Mulugeta Mammo <[email protected]>
Signed-off-by: Mulugeta Mammo <[email protected]>
Signed-off-by: Mulugeta Mammo <[email protected]>
Signed-off-by: Mulugeta Mammo <[email protected]>
Signed-off-by: Mulugeta Mammo <[email protected]>
…ain_2level. Signed-off-by: Mulugeta Mammo <[email protected]>
Signed-off-by: Mulugeta Mammo <[email protected]>
…om 51 to 53. Signed-off-by: Mulugeta Mammo <[email protected]>
Signed-off-by: Mulugeta Mammo <[email protected]>
@mengdilin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@mulugetam |
Signed-off-by: Mulugeta Mammo <[email protected]>
@mengdilin untracked. |
@mengdilin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@mengdilin merged this pull request in 3beb07b. |
Summary: The `_mm512_popcnt_epi64` intrinsic is used to accelerate Hamming distance calculations in `HammingComputerDefault` and `HammingComputer64`. Benchmarking with [bench_hamming_computer](https://github.com/facebookresearch/faiss/blob/main/benchs/bench_hamming_computer.cpp) on AWS [r7i](https://aws.amazon.com/ec2/instance-types/r7i/) instance shows a performance improvement of up to 30% compared to AVX-2. This PR depends on [PR#4025](#4025) Pull Request resolved: #4020 Reviewed By: junjieqi Differential Revision: D67650183 Pulled By: mengdilin fbshipit-source-id: 17e5b68570dced1fea0b885dd4e67c17dfc7bece
This PR adds a new architecture mode to support the new extensions to AVX512, namely AVX512-FP16, which have been available since Intel® Sapphire Rapids.
This PR is a prerequisite for PR#4020 that speeds up hamming distance evaluations.