Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GPU] fix property overwritten issue #28209

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

riverlijunjie
Copy link
Contributor

@riverlijunjie riverlijunjie commented Dec 26, 2024

Details:

  • Avoid ov::hint::dynamic_quantization_group_size and ov::hint::kv_cache_precision is overwritten to be default value if ExecutionConfig::apply_user_properties is called twice.

  • For example
    If user set ov::hint::dynamic_quantization_group_size to be 128, the second ExecutionConfig::apply_user_properties calling will rewrite it to be 32, such behavior will call performance drop on MTL 125H.

  • This issue is brought by PR: [GPU] Integrate dynamic quantization for onednn #26940

  • Performance before and after this PR:

image

Test result on master branch:

image

Tickets:

@riverlijunjie riverlijunjie requested review from a team as code owners December 26, 2024 06:16
@github-actions github-actions bot added the category: GPU OpenVINO GPU plugin label Dec 26, 2024
@byungilm
Copy link
Contributor

Hi! I have a question about the fix.
I see set_property() is writing config to internal_properties by "internal_properties[name]=val;".
Is there a certain reason that is_set_by_user() function searches its input name not from internal_properties, but from user_properties?

@riverlijunjie
Copy link
Contributor Author

Hi! I have a question about the fix. I see set_property() is writing config to internal_properties by "internal_properties[name]=val;". Is there a certain reason that is_set_by_user() function searches its input name not from internal_properties, but from user_properties?

yes, is_set_by_user() only searches user_properties, and user_properties will be cleared after each called. So the next calling of is_set_by_user() will return false, which will cause ov::hint::dynamic_quantization_group_size is reset to default value.

@byungilm
Copy link
Contributor

Hi! I have a question about the fix. I see set_property() is writing config to internal_properties by "internal_properties[name]=val;". Is there a certain reason that is_set_by_user() function searches its input name not from internal_properties, but from user_properties?

yes, is_set_by_user() only searches user_properties, and user_properties will be cleared after each called. So the next calling of is_set_by_user() will return false, which will cause ov::hint::dynamic_quantization_group_size is reset to default value.

I understood user_properties would be cleared.
when is_set_by_user() is true for a config, set_property() stores the config to internal_properties. Is it fine?

@riverlijunjie
Copy link
Contributor Author

Hi! I have a question about the fix. I see set_property() is writing config to internal_properties by "internal_properties[name]=val;". Is there a certain reason that is_set_by_user() function searches its input name not from internal_properties, but from user_properties?

yes, is_set_by_user() only searches user_properties, and user_properties will be cleared after each called. So the next calling of is_set_by_user() will return false, which will cause ov::hint::dynamic_quantization_group_size is reset to default value.

I understood user_properties would be cleared. when is_set_by_user() is true for a config, set_property() stores the config to internal_properties. Is it fine?

The policy should be reasonable, the latest user properties should be the highest priority, it can update internal_properties.

set_property(ov::hint::kv_cache_precision(ov::element::i8));
}

// Enable dynamic quantization by default for non-systolic platforms
if (!is_set_by_user(ov::hint::dynamic_quantization_group_size) && !info.supports_immad) {
if (!is_set_by_user(ov::hint::dynamic_quantization_group_size) &&
internal_properties.find(ov::hint::dynamic_quantization_group_size.name()) == internal_properties.end() &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change will disable kv_cache_compression and dynamic_group_size default configurations for non-systolic platforms entirely, because this check will never return true, as during property registration all properties are added to internal_properties with default values here.:

internal_properties[property.first] = property.second;

As a short-term solution, we can avoid using get_property(ov::hint::dynamic_quantization_group_size) calls at runtime for FC configuration, saving the value at model compilation stage to primitive/implementation or somewhere else
However, the proper solution would be to move these default configurations out of this function entirely and call them only once at the very beginning, either in the constructor or right after config creation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: GPU OpenVINO GPU plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants