Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GPU][MTL] Resolve long token performance regression in MTL 125H plat… #28155

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

riverlijunjie
Copy link
Contributor

@riverlijunjie riverlijunjie commented Dec 20, 2024

Details:

  • PR27831 enable MLP fusion in cldnn, it can improve performance, but it is not enabled in MTL 125H due to EU number is 112. So there should be no performance improvement, but PR26940, which integrate dynamic quantization, causes MTL 125H first token performance drop about 8% for 6K input token size. If we enable MLP fusion in MTL 125H, the performance regression will disappear.

  • test result:

image

Test cases First token latency Commit id
PR27900, before MLP fusion 22297.2 ms 536bd69
PR27831, MLP fusion PR but MLP fusion is disabled by EU < 128 22783.2 ms bf62609
PR26940 [GPU] Integrate dynamic quantization for onednn 24395.8 ms b840082
PR26940 + patch to enable MLP on MTL 125H (112 EUs) 22875.3 ms b840082

Tickets:

@riverlijunjie riverlijunjie requested review from a team as code owners December 20, 2024 02:00
@github-actions github-actions bot added the category: GPU OpenVINO GPU plugin label Dec 20, 2024
…form

PR27831 enable MLP fusion in cldnn, it can improve performance, but it is not enabled in MTL 125H due to EU number is 112.
So there should be no performance improvement, but PR26940, which integrate dynamic quantization, causes MTL 125H performance
drop about 10% for 6K input token size. If we enable MLP fusion in MTL 125H, the performance regression will disappear.
@yeonbok
Copy link
Contributor

yeonbok commented Dec 20, 2024

Hi @riverlijunjie according to @isanghao , PR26940 should not affect MTL because the PR was for onednn case. Could you clarify how PR26940 affected perf?

(And apart from the question, I think we can apply this change though)

@riverlijunjie
Copy link
Contributor Author

Hi @riverlijunjie according to @isanghao , PR26940 should not affect MTL because the PR was for onednn case. Could you clarify how PR26940 affected perf?

(And apart from the question, I think we can apply this change though)

PR26940 updates dynamic_quantize for cldnn, it should take effect on MTL, am i right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants