Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Metal backend #150

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

fix: Metal backend #150

wants to merge 2 commits into from

Conversation

PABannier
Copy link
Owner

This PR allows users to use the Metal (MacOS) and cuBLAS backend by:

  • Exposing the n_gpu_layers parameter in the CLI
  • Using the Metal backend in the forward pass

@siraben
Copy link

siraben commented Apr 19, 2024

After it creates the tokens and runs ggml_metal_init, I get this:

ggml_metal_init: GPU name:   Apple M1 Pro
ggml_metal_init: GPU family: MTLGPUFamilyApple7 (1007)
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  = 21845.34 MB
ggml_metal_init: maxTransferRate               = built-in GPU
ggml_metal_add_buffer: allocated 'backend         ' buffer, size =    54.36 MB, (   54.98 / 21845.34)
encodec_load_model_weights: model size =    44.36 MB
encodec_load_model: n_q = 32
ggml_metal_add_buffer: allocated 'backend         ' buffer, size =   314.06 MB, (  369.05 / 21845.34)
encodec_eval: compute buffer size: 314.05 MB

ggml_metal_graph_compute_block_invoke: error: node   0, op =   REPEAT not implemented
GGML_ASSERT: /Users/siraben/Git/bark.cpp/encodec.cpp/ggml/src/ggml-metal.m:1428: false
ggml_metal_graph_compute_block_invoke: error: node 4677, op = MAP_CUSTOM2_F32 not implemented
[1]    9701 abort      ./examples/main/main -ngl 100 -t 8 -m ./ggml_weights/ggml_weights.bin -em  -p

@PABannier
Copy link
Owner Author

Hello @siraben !
Indeed, it seems that some operations (e.g., repeat, which is used to broadcast computations) do not have a corresponding Metal kernel implemented in ggml. I'll open a PR to implement them.

@normatovjj
Copy link

When I try to run cmake -DGGML_CUBLAS=ON .. I get:

CMake Warning at encodec.cpp/ggml/src/CMakeLists.txt:219 (message):
  cuBLAS not found

@normatovjj
Copy link

When I try to run cmake -DGGML_CUBLAS=ON .. I get:

CMake Warning at encodec.cpp/ggml/src/CMakeLists.txt:219 (message):
  cuBLAS not found

I also tried CMAKE_ARGS='-DLLAMA_CUBLAS=on' cmake .. and added all the changes proposed in this pull, but to no success.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants