Replies: 3 comments 2 replies
-
how much slower? I think this is kind of expected but ccing @sayakpaul for more insights. |
Beta Was this translation helpful? Give feedback.
1 reply
-
Are you not popping your model to GPU? I don't see any placements. |
Beta Was this translation helpful? Give feedback.
1 reply
-
Also, going to move this to discussions as this is not a library issue. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Describe the bug
I use the optimization.quanto package to call the quantization function. When the model are quantized to fp8, the speed is much slower than bf16. want to know why, thank you?
Reproduction
Logs
No response
System Info
X86, torch2.4+cuda12.2
Who can help?
No response
Beta Was this translation helpful? Give feedback.
All reactions