Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metal #143

Open
zodiac1214 opened this issue Apr 10, 2024 · 3 comments
Open

Metal #143

zodiac1214 opened this issue Apr 10, 2024 · 3 comments

Comments

@zodiac1214
Copy link

I tried to

cmake -DGGML_METAL=ON  ..
cmake --build . --config Release

but it is still only using CPU instead of Mac GPU

@PABannier
Copy link
Owner

Hello @zodiac1214 !

You are right; there is a mistake in the implementation, which makes it impossible for now to run Bark.cpp on Metal. I'll fix it in the next few days.

@ochafik
Copy link

ochafik commented Apr 24, 2024

FWIW, tried to wire the -ngl params here and hit a wall with:

ggml_metal_graph_compute_block_invoke: error: node 5, op = SET not implemented

(a ggml sync might help?)

git remote add ochafik https://github.com/ochafik/bark.cpp
git fetch ochafik
rm -fR build && \
  cmake -B build . -DGGML_METAL=1 -DCMAKE_BUILD_TYPE=Release && \
  cmake --build build && \
  cp build/bin/ggml-metal.metal build/encodec.cpp/ggml/src

./build/examples/main/main -m ./models/bark/ggml_weights.bin -p "Test" -t 4 -o out2.wav
show full output
ggml_metal_init: loaded kernel_silu                           0x13ff44a50 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_relu                           0x13ff44c80 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_gelu                           0x13ff44eb0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_soft_max                       0x13ff450e0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_soft_max_4                     0x13ff45310 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_diag_mask_inf                  0x13ff45540 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_diag_mask_inf_8                0x13ff45770 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_f32                   0x13ff459a0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_f16                   0x13ff45bd0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q4_0                  0x13ff45e00 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q4_1                  0x13ff46030 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q8_0                  0x13ff46260 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q2_K                  0x13ff46490 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q3_K                  0x13ff466c0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q4_K                  0x13ff468f0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q5_K                  0x13ff46b20 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q6_K                  0x13ff46d50 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_rms_norm                       0x13ff46f80 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_norm                           0x13ff471b0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_f32_f32                 0x13ff473e0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_f16_f32                 0x13ff47610 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_f16_f32_1row            0x13ff47840 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_f16_f32_l4              0x140125080 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_q4_0_f32                0x1401252b0 | th_max =  896 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_q4_1_f32                0x1401254e0 | th_max =  896 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_q8_0_f32                0x140125710 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_q2_K_f32                0x140125940 | th_max =  640 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_q3_K_f32                0x140125b70 | th_max =  576 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_q4_K_f32                0x140125da0 | th_max =  576 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_q5_K_f32                0x140125fd0 | th_max =  576 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_q6_K_f32                0x140126200 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_f32_f32                 0x140126430 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_f16_f32                 0x140126660 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q4_0_f32                0x140126890 | th_max =  704 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q8_0_f32                0x140126ac0 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q4_1_f32                0x140126cf0 | th_max =  704 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q2_K_f32                0x140126f20 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q3_K_f32                0x140127150 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q4_K_f32                0x140127380 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q5_K_f32                0x1401275b0 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q6_K_f32                0x1401277e0 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_rope_f32                       0x140127a10 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_rope_f16                       0x140127c40 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_alibi_f32                      0x140127e70 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_f32_f16                    0x1401280a0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_f32_f32                    0x1401282d0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_f16_f16                    0x140128500 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_concat                         0x140128730 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_sqr                            0x140128960 | th_max = 1024 | th_width =   32
ggml_metal_init: GPU name:   Apple M1 Ultra
ggml_metal_init: GPU family: MTLGPUFamilyApple7 (1007)
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  = 98304.00 MB
ggml_metal_init: maxTransferRate               = built-in GPU
ggml_metal_add_buffer: allocated 'backend         ' buffer, size =    54.36 MB, ( 2561.59 / 98304.00)
encodec_load_model_weights: model size =    44.36 MB
encodec_load_model: n_q = 32

# bctx->n_gpu_layers = 99
bark_tokenize_input: prompt: 'I really love using llama.cpp and its ecosystem. They make me happy'
bark_tokenize_input: number of tokens in prompt = 513, first 8 tokens: 10194 40229 26186 23430 57329 10167 10219 26635 

ggml_metal_add_buffer: allocated 'backend         ' buffer, size =    10.06 MB, ( 2571.66 / 98304.00)
ggml_metal_graph_compute_block_invoke: error: node   5, op =      SET not implemented
GGML_ASSERT: /Users/ochafik/github/bark.cpp/encodec.cpp/ggml/src/ggml-metal.m:1428: false

@PABannier
Copy link
Owner

@ochafik Thanks for trying it!

Yes, we'll need to sync with the latest version of ggml. However, we'll have to implement additional operations to ggml and to write Metal kernels (e.g. sigmoid, pad_reflec_1).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants