You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The inference of the Flux model with LoRA applied and the inference of the quantized Flux model work fine on their own, but when combined, they often result in an error.
RuntimeError('Only Tensors of floating point and complex dtype can require gradients')
I don't know which library or environment is the main cause of this error, but I'm posting it here because it's easy to confirm the bug with the combination of Diffusers and Flux.
In the following demo, I'm using the pip version, but the same error occurs with the github version of Diffusers and PEFT.
The code is written in a roundabout way to avoid bugs in the Zero GPU space.
Demo Space for error reproduction (for Huggingface ZeroGPU)
The demo that was working a few minutes ago has stopped working due to a new error...
RuntimeError('NVML_SUCCESS == r INTERNAL ASSERT FAILED at "../c10/cuda/CUDACachingAllocator.cpp":838, please report a bug to PyTorch. ')
Extra: Zero GPU space and Quanto are too incompatible
In this demo, just by writing the following line, the inference in Zero GPU space will always crash. I think CUDA is being called within Quanto.🥶
fromoptimum.quantoimportfreeze, qfloat8, quantize
File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 214, in gradio_handler
raise res.value
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
Describe the bug
Merry Christmas.🎅
The inference of the Flux model with LoRA applied and the inference of the quantized Flux model work fine on their own, but when combined, they often result in an error.
I don't know which library or environment is the main cause of this error, but I'm posting it here because it's easy to confirm the bug with the combination of Diffusers and Flux.
In the following demo, I'm using the pip version, but the same error occurs with the github version of Diffusers and PEFT.
The code is written in a roundabout way to avoid bugs in the Zero GPU space.
Demo Space for error reproduction (for Huggingface ZeroGPU)
https://huggingface.co/spaces/John6666/diffusers_lora_error_test1
P.S.
The demo that was working a few minutes ago has stopped working due to a new error...
Extra: Zero GPU space and Quanto are too incompatible
In this demo, just by writing the following line, the inference in Zero GPU space will always crash. I think CUDA is being called within Quanto.🥶
Reproduction
Logs
No response
System Info
Who can help?
No response
The text was updated successfully, but these errors were encountered: