Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new architecture mode: 'avx512_spr'. #4025

Closed
wants to merge 11 commits into from

Conversation

mulugetam
Copy link
Contributor

This PR adds a new architecture mode to support the new extensions to AVX512, namely AVX512-FP16, which have been available since Intel® Sapphire Rapids.

This PR is a prerequisite for PR#4020 that speeds up hamming distance evaluations.

@mengdilin
Copy link
Contributor

Hmm weird the CIs are not running on the PR. Do you mind pushing a new commit and see if the CIs start?

@mulugetam
Copy link
Contributor Author

@mengdilin Pushed a new commit but CI not starting. Is this possibly because I updated .github/workflows/build.yml?

@mengdilin
Copy link
Contributor

Yea there is a syntax error in the build file: see https://github.com/facebookresearch/faiss/actions/runs/11804384709

@@ -67,6 +67,17 @@ jobs:
uses: ./.github/actions/build_cmake
with:
opt_level: avx512
linux-x86_64-AVX512-cmake:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

linux-x86_64-AVX512-advanced-cmake

@mulugetam
Copy link
Contributor Author

@mengdilin CI is picking g++ version 11. But AVX512-FP16 (-mavx512fp16) requires version 12+.

@mengdilin
Copy link
Contributor

Ah yea the conda publication CI is using the older compiler version. We should investigate on our side; however, I don't think we want to publish this architecture mode to conda right now, can you omit it?

Looks like CI failure is coming from a unit test failure from

def test_ivf_train_2level(self):

Try commenting out that test and see if anything else fails?
I'm going on PTO, handing this over to the next performance oncall @kuarora

-DFAISS_ENABLE_GPU=OFF \
-DFAISS_ENABLE_PYTHON=OFF \
-DBLA_VENDOR=Intel10_64lp \
-DCMAKE_INSTALL_LIBDIR=lib \
-DCMAKE_BUILD_TYPE=Release .

make -C _build -j$(nproc) faiss faiss_avx2 faiss_avx512
make -C _build -j$(nproc) faiss faiss_avx2 faiss_avx512 faiss_avx512_sr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used for faiss's conda packaging upload. I don't think we want to expose this build mode yet in conda officially. Can you omit this for now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@mulugetam
Copy link
Contributor Author

Thanks @mengdilin. @kuarora Could you please review?

@alexanderguzhva
Copy link
Contributor

@mulugetam I would use -march=sapphirerapids -mtune=sapphirerapids for the compiler flags, because SR supports many other AVX512 instruction extensions that are not currently listed among compiler flags

Copy link
Contributor

@mengdilin mengdilin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Back from PTO. LGTM, can you resolve the conflict so I can import the PR. For the unit test, can you relax the threshold instead of commenting it out? If somehow you cannot relax this threshold, you can skip this unittest when in SR mode similar to

@unittest.skipIf(
(ideally we don't have to do this)

endif()
if(NOT WIN32)
# Architecture mode to support AVX512 extensions available since Intel(R) Sapphire Rapids.
# Ref: https://networkbuilders.intel.com/solutionslibrary/intel-avx-512-fp16-instruction-set-for-intel-xeon-processor-based-products-technology-guide
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the ref!

@@ -568,7 +569,7 @@ def test_ivf_train_2level(self):
# normally 47 / 200 differences
ndiff = (Iref != Inew).sum()
self.assertLess(ndiff, 51)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there away to relax the threshold such that this test passes in SR mode as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mengdilin The test case passes in my machine with a ndiff value of 50 (not sure why it fails here). I will do another push that resolves the conflict. I have also updated the name from avx512-sr to avx512_spr to be consistent with what numpy returns.

@mulugetam mulugetam changed the title Add a new architecture mode: 'avx512-sr'. Add a new architecture mode: 'avx512_spr'. Dec 17, 2024
@mulugetam
Copy link
Contributor Author

@mengdilin Only timeout and anaconda_telemetry errors.

@mengdilin
Copy link
Contributor

@mulugetam there were some CI issues yesterday that we resolved. Can you rebase to latest commit and retry? The anaconda issues should definitely be resolved on the latest.

avx512_spr is a mode that supports avx512 features available since Intel(R) Sapphire Rapids.

Signed-off-by: Mulugeta Mammo <[email protected]>
Signed-off-by: Mulugeta Mammo <[email protected]>
Signed-off-by: Mulugeta Mammo <[email protected]>
@facebook-github-bot
Copy link
Contributor

@mengdilin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@mengdilin
Copy link
Contributor

@mulugetam faiss/python/swigfaiss_avx512_spr.swig shouldn't be checked in. Looks like it's part of .gitignore but it got accidentally tracked. Can you revert this?

@mulugetam
Copy link
Contributor Author

@mengdilin untracked.

@facebook-github-bot
Copy link
Contributor

@mengdilin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mengdilin merged this pull request in 3beb07b.

facebook-github-bot pushed a commit that referenced this pull request Dec 27, 2024
Summary:
The `_mm512_popcnt_epi64` intrinsic is used to accelerate Hamming distance calculations in `HammingComputerDefault` and `HammingComputer64`.

Benchmarking with [bench_hamming_computer](https://github.com/facebookresearch/faiss/blob/main/benchs/bench_hamming_computer.cpp) on AWS [r7i](https://aws.amazon.com/ec2/instance-types/r7i/) instance shows a performance improvement of up to 30% compared to AVX-2.

This PR depends on [PR#4025](#4025)

Pull Request resolved: #4020

Reviewed By: junjieqi

Differential Revision: D67650183

Pulled By: mengdilin

fbshipit-source-id: 17e5b68570dced1fea0b885dd4e67c17dfc7bece
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants