Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Black video with sage attention #334

Open
rkfg opened this issue Dec 20, 2024 · 4 comments
Open

Black video with sage attention #334

rkfg opened this issue Dec 20, 2024 · 4 comments

Comments

@rkfg
Copy link

rkfg commented Dec 20, 2024

Sage works with other models for me (such as Hunyuan) but not with CogVideoX. Debian testing, 3090 Ti, sageattention 2.0.0, torch 2.5.1+cu124, diffusers 0.31.0, transformers 4.47.0. Running 5b-1.5-I2V. The error is:

comfyui-1  | 2024-12-20T09:12:33.054432252Z /app/custom_nodes/ComfyUI-VideoHelperSuite/videohelpersuite/nodes.py:104: RuntimeWarning: invalid value encountered in cast
comfyui-1  | 2024-12-20T09:12:33.054463814Z   return tensor_to_int(tensor, 8).astype(np.uint8)

I updated your extension and VHS from master, it still doesn't work. Comfy and sdpa attention work fine (but they're slower), the rest seem to be buggy (black output).

@kijai
Copy link
Owner

kijai commented Dec 20, 2024

Cog 1.5 requires sageattention 2.0.0, and to use one of the specific modes on 3090, I don't remember which one, but I have exposed them in attention mode selection.

@rkfg
Copy link
Author

rkfg commented Dec 20, 2024

Yes, I use 2.0.0. sageattn_qk_int8_pv_fp8_cuda results in GPU crash with NVRM: Xid (PCI:0000:01:00): 43, pid=1425251, name=python, Ch 00000019 in dmesg:

  0%|          | 0/20 [00:00<?, ?it/s]terminate called after throwing an instance of 'c10::Error'             
comfyui-1  | 2024-12-20T09:41:30.926068905Z   what():  CUDA error: unspecified launch failure                                                                           
comfyui-1  | 2024-12-20T09:41:30.926086065Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
comfyui-1  | 2024-12-20T09:41:30.926088406Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1                                                                                     
comfyui-1  | 2024-12-20T09:41:30.926089895Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.                                   
comfyui-1  | 2024-12-20T09:41:30.926091671Z                                                                                                                                           
comfyui-1  | 2024-12-20T09:41:30.926093197Z Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
comfyui-1  | 2024-12-20T09:41:30.926094786Z frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f3b55e6d446 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)                                                                                                                                                                            
comfyui-1  | 2024-12-20T09:41:30.926096444Z frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f3b55e176e4 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)                                                                                                                                        
comfyui-1  | 2024-12-20T09:41:30.926098500Z frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f3b55f59a18 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so)                                                                                                                                       
comfyui-1  | 2024-12-20T09:41:30.926108031Z frame #3: <unknown function> + 0x600eb (0x7f3b55f610eb in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so)
comfyui-1  | 2024-12-20T09:41:30.926109009Z frame #4: <unknown function> + 0x5faf70 (0x7f3b5459af70 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so)
comfyui-1  | 2024-12-20T09:41:30.926109867Z frame #5: <unknown function> + 0x6f69f (0x7f3b55e4e69f in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)                    
comfyui-1  | 2024-12-20T09:41:30.926110692Z frame #6: c10::TensorImpl::~TensorImpl() + 0x21b (0x7f3b55e4737b in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
comfyui-1  | 2024-12-20T09:41:30.926111502Z frame #7: c10::TensorImpl::~TensorImpl() + 0x9 (0x7f3b55e47529 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
comfyui-1  | 2024-12-20T09:41:30.926112356Z frame #8: <unknown function> + 0x8c1a98 (0x7f3b54861a98 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so)          
comfyui-1  | 2024-12-20T09:41:30.926113250Z frame #9: THPVariable_subclass_dealloc(_object*) + 0x2c6 (0x7f3b54861de6 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so)                                                                                                                                                                               
comfyui-1  | 2024-12-20T09:41:30.926114230Z <omitting python frames>                                                                                                                  
comfyui-1  | 2024-12-20T09:41:30.926115039Z                                                

sageattn_qk_int8_pv_fp16_cuda — works 🎉
sageattn_qk_int8_pv_fp16_triton — black video, same error

All fused_sageattn modes give this:

comfyui-1  | 2024-12-20T09:48:31.640438551Z !!! Exception during processing !!! 'NoneType' object is not callable
comfyui-1  | 2024-12-20T09:48:31.640539983Z Traceback (most recent call last):
comfyui-1  | 2024-12-20T09:48:31.640541877Z   File "/app/execution.py", line 324, in execute
comfyui-1  | 2024-12-20T09:48:31.640543195Z     output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
comfyui-1  | 2024-12-20T09:48:31.640544447Z   File "/app/execution.py", line 199, in get_output_data
comfyui-1  | 2024-12-20T09:48:31.640545510Z     return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
comfyui-1  | 2024-12-20T09:48:31.640546623Z   File "/app/execution.py", line 170, in _map_node_over_list
comfyui-1  | 2024-12-20T09:48:31.640547630Z     process_inputs(input_dict, i)
comfyui-1  | 2024-12-20T09:48:31.640548600Z   File "/app/execution.py", line 159, in process_inputs
comfyui-1  | 2024-12-20T09:48:31.640549621Z     results.append(getattr(obj, func)(**inputs))
comfyui-1  | 2024-12-20T09:48:31.640550620Z   File "/app/custom_nodes/ComfyUI-CogVideoXWrapper/nodes.py", line 702, in process
comfyui-1  | 2024-12-20T09:48:31.640551725Z     latents = model["pipe"](
comfyui-1  | 2024-12-20T09:48:31.640552823Z   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
comfyui-1  | 2024-12-20T09:48:31.640553921Z     return func(*args, **kwargs)
comfyui-1  | 2024-12-20T09:48:31.640554911Z   File "/app/custom_nodes/ComfyUI-CogVideoXWrapper/pipeline_cogvideox.py", line 763, in __call__
comfyui-1  | 2024-12-20T09:48:31.640555948Z     noise_pred = self.transformer(
comfyui-1  | 2024-12-20T09:48:31.640556916Z   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
comfyui-1  | 2024-12-20T09:48:31.640565577Z     return self._call_impl(*args, **kwargs)
comfyui-1  | 2024-12-20T09:48:31.640566393Z   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
comfyui-1  | 2024-12-20T09:48:31.640567197Z     return forward_call(*args, **kwargs)
comfyui-1  | 2024-12-20T09:48:31.640567913Z   File "/app/custom_nodes/ComfyUI-CogVideoXWrapper/custom_cogvideox_transformer_3d.py", line 653, in forward
comfyui-1  | 2024-12-20T09:48:31.640568716Z     hidden_states, encoder_hidden_states = block(
comfyui-1  | 2024-12-20T09:48:31.640569440Z   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
comfyui-1  | 2024-12-20T09:48:31.640570259Z     return self._call_impl(*args, **kwargs)
comfyui-1  | 2024-12-20T09:48:31.640570992Z   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
comfyui-1  | 2024-12-20T09:48:31.640571783Z     return forward_call(*args, **kwargs)
comfyui-1  | 2024-12-20T09:48:31.640572498Z   File "/app/custom_nodes/ComfyUI-CogVideoXWrapper/custom_cogvideox_transformer_3d.py", line 312, in forward
comfyui-1  | 2024-12-20T09:48:31.640573330Z     attn_hidden_states, attn_encoder_hidden_states = self.attn1(
comfyui-1  | 2024-12-20T09:48:31.640574311Z   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
comfyui-1  | 2024-12-20T09:48:31.640575135Z     return self._call_impl(*args, **kwargs)
comfyui-1  | 2024-12-20T09:48:31.640575866Z   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
comfyui-1  | 2024-12-20T09:48:31.640576649Z     return forward_call(*args, **kwargs)
comfyui-1  | 2024-12-20T09:48:31.640577371Z   File "/usr/local/lib/python3.10/dist-packages/diffusers/models/attention_processor.py", line 495, in forward
comfyui-1  | 2024-12-20T09:48:31.640578185Z     return self.processor(
comfyui-1  | 2024-12-20T09:48:31.640578901Z   File "/app/custom_nodes/ComfyUI-CogVideoXWrapper/custom_cogvideox_transformer_3d.py", line 163, in __call__
comfyui-1  | 2024-12-20T09:48:31.640579702Z     hidden_states = self.attn_func(query, key, value, attn_mask=attention_mask, is_causal=False)
comfyui-1  | 2024-12-20T09:48:31.640580512Z TypeError: 'NoneType' object is not callable

Unsure what it needs but I'd like to test them and see if they're faster. So far only one sageattn mode works and it seems to be faster than comfy/sdpa.

@zejacky
Copy link

zejacky commented Dec 21, 2024

Cog 1.5 requires sageattention 2.0.0, and to use one of the specific modes on 3090, I don't remember which one, but I have exposed them in attention mode selection.

Thank you kijai. Currently I use version: 1.0.6 (pip show sageattention) on Windows 11.
Can I just uninstall 1.0.6 and install 2.0.0. Do you know if version 2.0.0 will it be compatible with Hunyuan and LTX?
Appreciate your work.

@kijai
Copy link
Owner

kijai commented Dec 21, 2024

Cog 1.5 requires sageattention 2.0.0, and to use one of the specific modes on 3090, I don't remember which one, but I have exposed them in attention mode selection.

Thank you kijai. Currently I use version: 1.0.6 (pip show sageattention) on Windows 11. Can I just uninstall 1.0.6 and install 2.0.0. Do you know if version 2.0.0 will it be compatible with Hunyuan and LTX? Appreciate your work.

They just released 2.0.1 which I have not tested yet, installing 2.0.x currently is harder as it's in beta and you have to compile it yourself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants