Replies: 1 comment 1 reply
-
Currently, it is not feasible to implement batch processing since many functions require optimizations to reduce memory usage in images larger than 512x512. In the VAE stage conv2d with a image of 1024 x 1024 use 7 GB of VRAM/RAM to storage his compute buffer. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
As far as I have investigated, the current implementation for batch processing does not insert multiple prompts into a model once but rather puts an individual prompt at a time. For instance, generating 4 images requires inference 4 times, not one.
I need the "real batch process" instead of the current implementation and would like to implement this feature in my spare time, hoping that this will contribute to the main repo. Could you give some tips for this?
I am quite new to ggml so I am not sure how much this would take to implement the feature but I do know the batch process is implemented in whisper/llama.cpp so I think it is possible from an architectural perspective
Beta Was this translation helpful? Give feedback.
All reactions