Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU build fails on CUDA 12.5 #574

Open
ludvigak opened this issue Oct 14, 2024 · 3 comments
Open

GPU build fails on CUDA 12.5 #574

ludvigak opened this issue Oct 14, 2024 · 3 comments

Comments

@ludvigak
Copy link
Contributor

Strangely, GPU build fails on CUDA 12.5, with failure starting in thrust/extrema.h

See cufinufft_jll build log:
https://buildkite.com/julialang/yggdrasil/builds/13901/canvas

I had the same on my local machine before updating to 12.6. Error message was this:

In file included from /usr/local/cuda/include/cub/util_device.cuh:52,
                 from /usr/local/cuda/include/thrust/system/cuda/detail/util.h:48,
                 from /usr/local/cuda/include/thrust/system/cuda/detail/internal/copy_cross_system.h:49,
                 from /usr/local/cuda/include/thrust/system/cuda/detail/copy.h:111,
                 from /usr/local/cuda/include/thrust/system/detail/adl/copy.h:50,
                 from /usr/local/cuda/include/thrust/detail/copy.inl:31,
                 from /usr/local/cuda/include/thrust/detail/copy.h:98,
                 from /usr/local/cuda/include/thrust/system/detail/sequential/merge.inl:29,
                 from /usr/local/cuda/include/thrust/system/detail/sequential/merge.h:86,
                 from /usr/local/cuda/include/thrust/system/cpp/detail/merge.h:30,
                 from /usr/local/cuda/include/thrust/system/cpp/execution_policy.h:59,
                 from /usr/local/cuda/include/thrust/execution_policy.h:40,
                 from /usr/local/cuda/include/thrust/detail/get_iterator_value.h:29,
                 from /usr/local/cuda/include/thrust/system/detail/generic/extrema.inl:33,
                 from /usr/local/cuda/include/thrust/system/detail/generic/extrema.h:95,
                 from /usr/local/cuda/include/thrust/detail/extrema.inl:31,
                 from /usr/local/cuda/include/thrust/extrema.h:808,
                 from /home/lag01/workspace/finufft/include/cufinufft/utils.h:17,
                 from /home/lag01/workspace/finufft/src/cuda/utils.cpp:1:
/usr/local/cuda/include/cub/util_ptx.cuh: In function ‘void cub::CUB_200400___CUDA_ARCH_LIST___NS::CTA_SYNC()’:
/usr/local/cuda/include/cub/util_ptx.cuh:271:5: error: ‘__syncthreads’ was not declared in this scope
  271 |     __syncthreads();
      |     ^~~~~~~~~~~~~
/usr/local/cuda/include/cub/util_ptx.cuh: In function ‘int cub::CUB_200400___CUDA_ARCH_LIST___NS::CTA_SYNC_AND(int)’:
/usr/local/cuda/include/cub/util_ptx.cuh:280:12: error: ‘__syncthreads_and’ was not declared in this scope
  280 |     return __syncthreads_and(p);
      |            ^~~~~~~~~~~~~~~~~
/usr/local/cuda/include/cub/util_ptx.cuh: In function ‘int cub::CUB_200400___CUDA_ARCH_LIST___NS::CTA_SYNC_OR(int)’:
/usr/local/cuda/include/cub/util_ptx.cuh:289:12: error: ‘__syncthreads_or’ was not declared in this scope
  289 |     return __syncthreads_or(p);
      |            ^~~~~~~~~~~~~~~~
/usr/local/cuda/include/cub/util_ptx.cuh: In function ‘void cub::CUB_200400___CUDA_ARCH_LIST___NS::WARP_SYNC(unsigned int)’:
/usr/local/cuda/include/cub/util_ptx.cuh:298:5: error: ‘__syncwarp’ was not declared in this scope
  298 |     __syncwarp(member_mask);
      |     ^~~~~~~~~~
/usr/local/cuda/include/cub/util_ptx.cuh: In function ‘int cub::CUB_200400___CUDA_ARCH_LIST___NS::WARP_ANY(int, unsigned int)’:
/usr/local/cuda/include/cub/util_ptx.cuh:307:12: error: ‘__any_sync’ was not declared in this scope
  307 |     return __any_sync(member_mask, predicate);
      |            ^~~~~~~~~~
/usr/local/cuda/include/cub/util_ptx.cuh: In function ‘int cub::CUB_200400___CUDA_ARCH_LIST___NS::WARP_ALL(int, unsigned int)’:
/usr/local/cuda/include/cub/util_ptx.cuh:316:12: error: ‘__all_sync’ was not declared in this scope
  316 |     return __all_sync(member_mask, predicate);
      |            ^~~~~~~~~~
/usr/local/cuda/include/cub/util_ptx.cuh: In function ‘int cub::CUB_200400___CUDA_ARCH_LIST___NS::WARP_BALLOT(int, unsigned int)’:
/usr/local/cuda/include/cub/util_ptx.cuh:325:12: error: ‘__ballot_sync’ was not declared in this scope
  325 |     return __ballot_sync(member_mask, predicate);
      |            ^~~~~~~~~~~~~
/usr/local/cuda/include/cub/util_ptx.cuh: In function ‘unsigned int cub::CUB_200400___CUDA_ARCH_LIST___NS::SHFL_IDX_SYNC(unsigned int, int, unsigned int)’:
/usr/local/cuda/include/cub/util_ptx.cuh:368:12: error: ‘__shfl_sync’ was not declared in this scope
  368 |     return __shfl_sync(member_mask, word, src_lane);
      |            ^~~~~~~~~~~
/usr/local/cuda/include/cub/util_ptx.cuh: In function ‘int cub::CUB_200400___CUDA_ARCH_LIST___NS::RowMajorTid(int, int, int)’:
/usr/local/cuda/include/cub/util_ptx.cuh:415:39: error: ‘threadIdx’ was not declared in this scope
  415 |     return ((block_dim_z == 1) ? 0 : (threadIdx.z * block_dim_x * block_dim_y)) +
      |                                       ^~~~~~~~~
@ludvigak
Copy link
Contributor Author

Looks related to NVIDIA/cccl#1373

@aryabhatt
Copy link

I ran into the same issue. Apparently, thrust doesn't like c++ compliers. You have two options before the upstream fix becomes available:

  1. rename *.cpp files in src/cuda/ to cu, and make sure to update the CMakeLists.txt in the same directory PREFERED.
  2. Edit include/cufinufft/utils.h. Put #ifdef __CUDACC__ around "#include <thrust/extrema.h>" (line 17) and the arrayrange function definition that uses thrust (lines 144 - 154). RISKY

@DiamonDinoia
Copy link
Collaborator

I had issues in the past with option 1. With nvcc failing to compile completely fine c++ becuase it did not support/understand some templating. I think is worth checking option 2 if it is only one function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants