Inferencing not working with P2P in latest version. #3968

j4ys0n · 2024-10-26T05:44:29Z

LocalAI version:

localai/localai:latest-gpu-nvidia-cuda-12
LocalAI version: v2.22.1 (015835d)

Environment, CPU architecture, OS, and Version:

Linux localai3 6.8.12-2-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-2 (2024-09-05T10:03Z) x86_64 GNU/Linux
(Proxmox LXC, Debian. AMD EPYC 7302P (16 cores allocated)/64GB RAM

Describe the bug

When testing distributed inferencing, i select a model (qwen 2.5 14b), send a chat message, the model loads on both instances (main and worker) and then the model does not respond and the model unloads on the worker. (watching with nvitop)

To Reproduce

description above should reproduce, i tried a few times.

Expected behavior

model should not unload & chat should complete

Logs

worker logs

{"level":"INFO","time":"2024-10-26T05:07:23.924Z","caller":"discovery/dht.go:115","message":" Bootstrapping DHT"}
create_backend: using CUDA backend
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA RTX 2000 Ada Generation, compute capability 8.9, VMM: yes
Starting RPC server on 127.0.0.1:46609, backend memory: 16380 MB
Accepted client connection, free_mem=17175674880, total_mem=17175674880
Client connection closed
Accepted client connection, free_mem=17175674880, total_mem=17175674880
Client connection closed
Accepted client connection, free_mem=17175674880, total_mem=17175674880
Client connection closed
Accepted client connection, free_mem=17175674880, total_mem=17175674880
Client connection closed

Accepted client connection, free_mem=17175674880, total_mem=17175674880
Client connection closed
Accepted client connection, free_mem=17175674880, total_mem=17175674880
Client connection closed
Accepted client connection, free_mem=17175674880, total_mem=17175674880
Client connection closed
Accepted client connection, free_mem=17175674880, total_mem=17175674880
Client connection closed

main logs

5:25AM INF Success ip=my.ip.address latency="960.876µs" method=POST status=200 url=/v1/chat/completions
5:25AM INF Trying to load the model 'qwen2.5-14b-instruct' with the backend '[llama-cpp llama-ggml llama-cpp-fallback rwkv stablediffusion whisper piper huggingface bert-embeddings /build/backend/python/rerankers/run.sh /build/backend/python/diffusers/run.sh /build/backend/python/vall-e-x/run.sh /build/backend/python/parler-tts/run.sh /build/backend/python/sentencetransformers/run.sh /build/backend/python/mamba/run.sh /build/backend/python/openvoice/run.sh /build/backend/python/coqui/run.sh /build/backend/python/bark/run.sh /build/backend/python/transformers-musicgen/run.sh /build/backend/python/transformers/run.sh /build/backend/python/exllama2/run.sh /build/backend/python/sentencetransformers/run.sh /build/backend/python/autogptq/run.sh /build/backend/python/vllm/run.sh]'
5:25AM INF [llama-cpp] Attempting to load
5:25AM INF Loading model 'qwen2.5-14b-instruct' with backend llama-cpp
5:25AM INF [llama-cpp-grpc] attempting to load with GRPC variant
5:25AM INF Redirecting 127.0.0.1:35625 to /ip4/worker-ip/udp/44701/quic-v1
5:25AM INF Redirecting 127.0.0.1:35625 to /ip4/worker-ip/udp/44701/quic-v1
5:25AM INF Redirecting 127.0.0.1:35625 to /ip4/worker-ip/udp/44701/quic-v1
5:25AM INF Redirecting 127.0.0.1:35625 to /ip4/worker-ip/udp/44701/quic-v1
5:25AM INF Success ip=127.0.0.1 latency="35.55µs" method=GET status=200 url=/readyz
5:26AM INF Node localai-oYURMqpWCR is offline, deleting
Error accepting:  accept tcp 127.0.0.1:35625: use of closed network connection

Additional context

this worked in the last version, though i'm not sure what that was at this point (~2 weeks ago)
model loads and works fine without the worker.

The text was updated successfully, but these errors were encountered:

j4ys0n · 2024-10-26T05:47:57Z

i'm using docker compose, here's the config. https://github.com/j4ys0n/local-ai-stack

JackBekket · 2024-11-16T00:45:27Z

it is related to edgevpn, it somehow see peer's and addresses but cannot connect with them.

I has tried NAT traversal using libp2p+pubsub and I have managed to make peer discovery and establish p2p connection by randez-vous point.

In your case, if you know your worker address you can just put worker address into ENV of local-ai as gRPC external backend addresses.

mintyleaf · 2024-11-28T16:07:48Z

@mudler

fixed by #4220
more info in duplicate (sorry for that) #4214

vpereira · 2024-12-04T14:19:02Z

fixed by #4220
more info in duplicate (sorry for that) #4214

looks like it still didn't make to the images quay.io/go-skynet/local-ai:latest-cpu or the (dockerhub) localai/localai:latest-cpu?

j4ys0n added bug Something isn't working unconfirmed labels Oct 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inferencing not working with P2P in latest version. #3968

Inferencing not working with P2P in latest version. #3968

j4ys0n commented Oct 26, 2024

j4ys0n commented Oct 26, 2024

JackBekket commented Nov 16, 2024

mintyleaf commented Nov 28, 2024

vpereira commented Dec 4, 2024 •

edited

Loading

Inferencing not working with P2P in latest version. #3968

Inferencing not working with P2P in latest version. #3968

Comments

j4ys0n commented Oct 26, 2024

j4ys0n commented Oct 26, 2024

JackBekket commented Nov 16, 2024

mintyleaf commented Nov 28, 2024

vpereira commented Dec 4, 2024 • edited Loading

vpereira commented Dec 4, 2024 •

edited

Loading