rocswap = llama.cpp + ROCm + llama-swap

Allows you to run llama.cpp with ROCm acceleration on most Radeon RX Vega/5000/6000/7000, even those not on AMD's official ROCm supported GPU list.

Requirements

Linux with the amdgpu driver ROCm interface enabled. Debian Bookworm Backports, Debian Trixie/Sid, and Ubuntu 24.04 with this already done. For other distros you might need to use the amdgpu-install script from the AMD website.

If using Debian or Ubuntu, make sure your GPU is on the Debian ROCm supported GPU list in Trixie/Sid. The Bookworm Backports kernel has the same support level as Trixie.

Add your user to the video and render groups on your system: usermod -aG video,render "$USER". Log out and log in again. Confirm with the groups command.

Instructions

Look up your GPU in the LLVM amdgpu targets and replace my gfx1010 in the Containerfile with your GPU's architecture name.

Build the container:

podman build . -t rocswap

Deploy the container:

podman run -dit -p 8080:8080 --name rocswap \
  -v ./models:/models \
  -v ./config.yaml:/config.yaml \
  --device /dev/dri --device /dev/kfd \
  --group-add keep-groups \
  --user 1000:1000 \
  rocswap

If you have models which are smaller than your VRAM (minus about 1 GiB for other allocations) then you can keep -ngl 99 in the server config to load all layers on the GPU.

If you are running a model larger than your GPU's VRAM, then use the llama-swap llama.cpp log output (http://localhost:8080/logs) and the radeontop commandline program to load as many layers as you can with the llama.cpp -ngl option without overflowing VRAM. The other layers will run on the CPU.

For example, I have a Radeon RX 5600 XT 6Gb. I can load all of small models like Gemma-2-2B-it or Phi-3.5-mini-instruct (4B) on the GPU. To load a larger model like Llama-3.1-8B-Q6KL, I can only load 24 layers of the model's 33 layers so I use -ngl 24.

License

Creative Commons Zero 1.0 Universal

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Containerfile		Containerfile
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rocswap = llama.cpp + ROCm + llama-swap

Contents

Requirements

Instructions

License

About

Languages

License

superjamie/rocswap

Folders and files

Latest commit

History

Repository files navigation

rocswap = llama.cpp + ROCm + llama-swap

Contents

Requirements

Instructions

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages