How to check if docker image hosted on Windows system is actually using the GPU? #1460

tuelle · 2023-12-19T17:33:32Z

tuelle
Dec 19, 2023

I'm using LocalAI on a system with the RTX4070 GPU with 8GB on a ZBOX barebone. I have configured the docker-compose file to pass through access to the GPU. Also configured use of cuBLAS. But it is actually getting slower than using the CPU setup with 14 Cores.

How can I check if the GPU is actually used by LocalAI? I installed the Nvidia driver and docker-desktop on the host. Do I also have to install other libraries on the host? Do I have to configure the docker service?

When I run docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi I get this output:

+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 4884 C+G ...\Docker\frontend\Docker Desktop.exe N/A |
| 0 N/A N/A 6752 C+G C:\Windows\explorer.exe N/A |
| 0 N/A N/A 9972 C+G ...2txyewy\StartMenuExperienceHost.exe N/A |
| 0 N/A N/A 10724 C+G ...CBS_cw5n1h2txyewy\TextInputHost.exe N/A |
| 0 N/A N/A 11608 C+G ....Search_cw5n1h2txyewy\SearchApp.exe N/A |
| 0 N/A N/A 12428 C+G ...oogle\Chrome\Application\chrome.exe N/A |
| 0 N/A N/A 14848 C+G ...crosoft\Edge\Application\msedge.exe N/A |
| 0 N/A N/A 16708 C+G ...5n1h2txyewy\ShellExperienceHost.exe N/A |
+---------------------------------------------------------------------------------------+

So its seems to me that the docker-desktop service on the host ist configured correctly (at least for the standard nvidia image)

dionysius · 2024-01-10T00:34:33Z

dionysius
Jan 10, 2024

12:32AM DBG GRPC(dolphin-2_6-phi-2.Q4_K_M.gguf-127.0.0.1:44467): stderr llm_load_tensors: using CUDA for GPU acceleration
12:32AM DBG GRPC(dolphin-2_6-phi-2.Q4_K_M.gguf-127.0.0.1:44467): stderr llm_load_tensors: system memory used  =   70.44 MiB
12:32AM DBG GRPC(dolphin-2_6-phi-2.Q4_K_M.gguf-127.0.0.1:44467): stderr llm_load_tensors: VRAM used           = 1634.32 MiB
12:32AM DBG GRPC(dolphin-2_6-phi-2.Q4_K_M.gguf-127.0.0.1:44467): stderr llm_load_tensors: offloading 32 repeating layers to GPU
12:32AM DBG GRPC(dolphin-2_6-phi-2.Q4_K_M.gguf-127.0.0.1:44467): stderr llm_load_tensors: offloading non-repeating layers to GPU
12:32AM DBG GRPC(dolphin-2_6-phi-2.Q4_K_M.gguf-127.0.0.1:44467): stderr llm_load_tensors: offloaded 33/33 layers to GPU

When you see offloaded X/Y layers to GPU for LLMs using llama.cpp. Remember the model config itself has gpu_layers config option that needs to be set. https://localai.io/features/gpu-acceleration/index.html#model-configuration

0 replies

atljoseph · 2024-04-29T22:21:35Z

atljoseph
Apr 29, 2024

Use the nvtop command

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to check if docker image hosted on Windows system is actually using the GPU? #1460

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

Select a reply

How to check if docker image hosted on Windows system is actually using the GPU? #1460

tuelle Dec 19, 2023

Replies: 2 comments

dionysius Jan 10, 2024

atljoseph Apr 29, 2024

tuelle
Dec 19, 2023

dionysius
Jan 10, 2024

atljoseph
Apr 29, 2024