Replies: 5 comments
-
My observation is that |
Beta Was this translation helpful? Give feedback.
-
use lcm lora sdv1.5 with 4 steps and taesd it be much faster |
Beta Was this translation helpful? Give feedback.
-
pretty sure you meant s/it |
Beta Was this translation helpful? Give feedback.
-
Yeah s/it... Please keep in mind that ComfyUI is using f32 and not any lower quantization with 20 steps and is more than 2X faster. Anyone know what tensorvision-cpu is doing that's so much faster than gglm? |
Beta Was this translation helpful? Give feedback.
-
@RogerDass It's just that PyTorch implements more optimized convolution algorithms that are too complex to implement in ggml. That's why PyTorch is quite heavy; instead of reinventing the wheel, they reuse existing code to avoid unnecessary complications. |
Beta Was this translation helpful? Give feedback.
-
Hello!
Thanks for making this amazing project!
So i'm running this with realistic vision v1.5 checkpoint. I'm getting ~75 s/it with OpenBLAS enabled.
Any idea how to speed that up significantly?
With ComfyUI on the same machine in CPU mode, I'm getting ~30 s/it and it takes 10 min to generate 512x512 image with 20 steps.
What's causing such a large performance difference?
Do you know if there's a way to get some basic OpenGL 3 acceleration for some of the tensor ops?
Beta Was this translation helpful? Give feedback.
All reactions