Batch process #239

wonkyoc · 2024-04-24T01:47:19Z

wonkyoc
Apr 24, 2024

As far as I have investigated, the current implementation for batch processing does not insert multiple prompts into a model once but rather puts an individual prompt at a time. For instance, generating 4 images requires inference 4 times, not one.

I need the "real batch process" instead of the current implementation and would like to implement this feature in my spare time, hoping that this will contribute to the main repo. Could you give some tips for this?

I am quite new to ggml so I am not sure how much this would take to implement the feature but I do know the batch process is implemented in whisper/llama.cpp so I think it is possible from an architectural perspective

FSSRepo · 2024-05-01T18:52:30Z

FSSRepo
May 1, 2024

Currently, it is not feasible to implement batch processing since many functions require optimizations to reduce memory usage in images larger than 512x512. In the VAE stage conv2d with a image of 1024 x 1024 use 7 GB of VRAM/RAM to storage his compute buffer.

1 reply

wonkyoc May 3, 2024
Author

That makes sense. The VAE may be serialized but I guess other internal functions for unet are yet to be optimized.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch process #239

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Batch process #239

wonkyoc Apr 24, 2024

Replies: 1 comment · 1 reply

FSSRepo May 1, 2024

wonkyoc May 3, 2024 Author

wonkyoc
Apr 24, 2024

Replies: 1 comment 1 reply

FSSRepo
May 1, 2024

wonkyoc May 3, 2024
Author