-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GPU] fix property overwritten issue #28209
base: master
Are you sure you want to change the base?
[GPU] fix property overwritten issue #28209
Conversation
Hi! I have a question about the fix. |
yes, |
I understood user_properties would be cleared. |
The policy should be reasonable, the latest user properties should be the highest priority, it can update internal_properties. |
set_property(ov::hint::kv_cache_precision(ov::element::i8)); | ||
} | ||
|
||
// Enable dynamic quantization by default for non-systolic platforms | ||
if (!is_set_by_user(ov::hint::dynamic_quantization_group_size) && !info.supports_immad) { | ||
if (!is_set_by_user(ov::hint::dynamic_quantization_group_size) && | ||
internal_properties.find(ov::hint::dynamic_quantization_group_size.name()) == internal_properties.end() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change will disable kv_cache_compression and dynamic_group_size default configurations for non-systolic platforms entirely, because this check will never return true, as during property registration all properties are added to internal_properties with default values here.:
internal_properties[property.first] = property.second; |
As a short-term solution, we can avoid using get_property(ov::hint::dynamic_quantization_group_size)
calls at runtime for FC configuration, saving the value at model compilation stage to primitive/implementation or somewhere else
However, the proper solution would be to move these default configurations out of this function entirely and call them only once at the very beginning, either in the constructor or right after config creation
Details:
Avoid
ov::hint::dynamic_quantization_group_size
andov::hint::kv_cache_precision
is overwritten to be default value ifExecutionConfig::apply_user_properties
is called twice.For example
If user set
ov::hint::dynamic_quantization_group_size
to be 128, the secondExecutionConfig::apply_user_properties
calling will rewrite it to be 32, such behavior will call performance drop on MTL 125H.This issue is brought by PR: [GPU] Integrate dynamic quantization for onednn #26940
Performance before and after this PR:
Test result on master branch:
Tickets: