ggml is compiled with cublas, but GPU is not used #106

artur-ag · 2024-11-27T05:49:03Z

I compiled ggml with -DGGML_CUBLAS=ON and then clip.cpp, and used it to get text encodings, but the GPU is not being used. The code takes the same amount of time as it did with CPU-only. Is this expected? Does clip_text_encode always use the CPU no matter what? Or did I forget to do something?

Details:
ggml is detecting the GPU without problem (Nvidia AGX Orin):

$ ./myapp
ggml_init_cublas: found 1 CUDA devices:
  Device 0: Orin, compute capability 8.7

Simplified version of my code:

#include "clip.h"
// ...
string model = "clip-vit-base-patch32_ggml-text-model-f16.gguf";
clip_ctx *ctx = clip_model_load(model.c_str(), verbosity);
for (int i = 0; i < 1000; i++)
  clip_tokenize(ctx, "person".c_str(), &tokens);
  float txt_vec[512];
  clip_text_encode(ctx, /*threads:*/4, &tokens, txt_vec, true);
}

This takes 8 seconds to finish. While this runs, I have jtop open, and I see the GPU is only active during the first 3 seconds, when ggml gets the GPU name and compute capability to print them. After that, the GPU goes offline. GPU usage is always 0%.

The text was updated successfully, but these errors were encountered:

monatis · 2024-11-27T06:25:48Z

Hi @artur-ag, I implemented clip.cpp back in the days when Convolution on CUDA was limited in GGML. Then, I implemented Cuda support for multimodal inference in llama.cpp, but I didn't backported it to clip.cpp.
I've just started working on llama.cpp-related projects again and one goal is also to modernize clip.cpp soon (I hope to announce new tools and libraries in one week or so, and modernized clip.cpp will follow them).

artur-ag · 2024-11-27T06:53:29Z

Got it, thank you for the fast reply, and thank you for clip.cpp!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml is compiled with cublas, but GPU is not used #106

ggml is compiled with cublas, but GPU is not used #106

artur-ag commented Nov 27, 2024

monatis commented Nov 27, 2024

artur-ag commented Nov 27, 2024

ggml is compiled with cublas, but GPU is not used #106

ggml is compiled with cublas, but GPU is not used #106

Comments

artur-ag commented Nov 27, 2024

monatis commented Nov 27, 2024

artur-ag commented Nov 27, 2024