An OpenAI Compatible Web Server for llama.cpp #795
Replies: 13 comments 11 replies
-
@abetlen thanks a lot for your Py wrapper. Just to add some additional settings for others,
Assuming 192.168.0.1 as the server IP, then when using original OpenAI's
and for the chatbot-ui:
Note: there is no trailing ' |
Beta Was this translation helpful? Give feedback.
-
Hello, where to specifica the prompt template? |
Beta Was this translation helpful? Give feedback.
-
@abetlen Your code can not be run: \llama_cpp\llama.py", line 1435, in del |
Beta Was this translation helpful? Give feedback.
-
@abetlen thank u, it worked! How to specificy the prompt template and params anyway? |
Beta Was this translation helpful? Give feedback.
-
请问这是什么情况no matches found: llama-cpp-python[server] |
Beta Was this translation helpful? Give feedback.
-
Could you give an example for using openai call the web server? |
Beta Was this translation helpful? Give feedback.
-
Can we add grammar support #1773 to the llama-cpp-python web-server? Currently that option is not there |
Beta Was this translation helpful? Give feedback.
-
After installing llama-cpp-python, I can create a container using the following command:
When accessing chatbot-ui in the browser, it always prompts that there is no OpenAI API key. What should I do? How to find the key of API-KEY? thanks |
Beta Was this translation helpful? Give feedback.
-
Add GPU Support for ServerTo add Nvidia GPU support: # Install Server with OpenAI Compatible API - with CUDA GPU support
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python[server]
# Run the Server using LLaMA-2 7B 5-bit quantized model
python3 -m llama_cpp.server \
--model ./models/llama-2-7b-chat.Q5_K_M.gguf \
--host localhost \
--n_gpu_layers 32 Simple Command Line ChatbotHere is a simple python CLI chatbot for the server: chat.py - with features:
Tested on Ubuntu Linux host with an Intel i5-6500 CPU @ 3.20GHz, 8GB RAM and an Nvidia GTX 1060 GPU with 6GB VRAM - Approx 12 tokens/second. |
Beta Was this translation helpful? Give feedback.
-
Hey got this to work on my end. Thanks for sharing. I noticed that the completion requests are handled sequentially. Is there a way to set it to do multi processing? |
Beta Was this translation helpful? Give feedback.
-
I got a question about this server. |
Beta Was this translation helpful? Give feedback.
-
Hi thanks for all you great work at providing a wrapper with a web server. I got the wrapper working on my cpu but I have a ROCm system. I have llama.cpp fully working on my GPU so I have tried to compile llama_cpp_python with However this gets stuck as my cpu has 6 cores and I think Ninja is using 100% of these threads and this results in the following:
Hope you can advise. I need to know how to specify to cmake to use fewer threads I think. Alas the options
Doesnt work. I am using a 6 core amd Ryzen 5 setup |
Beta Was this translation helpful? Give feedback.
-
Hi @abetlen, thanks for llama-cpp-python. Have some way to add a "stop" on chat/completions json schema like have on the completions? |
Beta Was this translation helpful? Give feedback.
-
Hey everyone,
Just wanted to share that I integrated an OpenAI-compatible webserver into the
llama-cpp-python
package so you should be able to serve and use any llama.cpp compatible models with (almost) any OpenAI client.Check out the README but the basic setup process is
Then just navigate to http://localhost:8000/docs to start playing around with it using the Swagger UI.
In terms of compatibility I've tested it with the official OpenAI python library by just swapping out
openai.api_base
for the server URL and it seems to work. I've also had success using it with @mckaywrigley chatbot-ui which is a self hosted ChatGPT ui clone you can run with docker. Just launch with-e OPENAI_API_HOST=<api-url>
to get started.Caveats
logprobs
and anything that's OpenAI specific but llama.cpp doesn't support likebest_of
parameter is just ignored silently.tiktoken
or some other OpenAI model-specific tokenizer may not work or be buggy, just a heads upBeta Was this translation helpful? Give feedback.
All reactions