Do not check trace for diffusers, saving memory and time for FLUX #1064

mvafin · 2024-12-11T10:14:15Z

What does this PR do?

This is further optimization of memory consumption of diffusers conversion, continuation of this PR: #1033
When check_trace=True the TorchScript graph is generated second time and is compared with the graph generated first time. It is useful to catch incorrect traced graph sometimes, but in optimum we control which models are supported and such issues shouldn't happen.
Currently only introduce this for diffusers, but can be done for all the models.
The most impact on memory is demonstrated for FLUX, for other diffusers it significantly reduces conversion time.

Model	Before	After
FLUX	~58Gb/~240s	~33Gb/~130s
SD-3.5-medium	~18Gb/~250s	~18Gb/~100s

|

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

AlexKoff88 · 2024-12-12T06:53:35Z

Awesome! Thanks @mvafin.

Please check and fix the quality:

black --check .
ruff check .

HuggingFaceDocBuilderDev · 2024-12-12T06:56:13Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

mvafin · 2024-12-12T12:38:16Z

ruff check .

Done

AlexKoff88 · 2024-12-12T14:08:22Z

@alexsu52 . Shall we have an option that disables this behavior during model export? Such an option can be enabled by default.

IlyasMoutawwakil

Not a fan of patches, long term solution would be to decouple the export process by doing torch.jit.trace with the arguments we want then ov.convert_model with the resulting model.

AlexKoff88 · 2024-12-13T09:16:20Z

Not a fan of patches, long term solution would be decouple the export process by doing torch.jit.trace with the arguments we want then ov.convert_model with the resulting model.

I agree. I see that torch.jit.trace in one place in the Optimum-Intel code. @mvafin, why did you just not pass the argument there?

mvafin · 2024-12-13T10:38:47Z

Not a fan of patches, long term solution would be decouple the export process by doing torch.jit.trace with the arguments we want then ov.convert_model with the resulting model.

I agree. I see that torch.jit.trace in one place in the Optimum-Intel code. @mvafin, why did you just not pass the argument there?

torch.jit.trace is called inside the ov.convert_model, there is no control over it from outside. We could do torch.jit.trace before calling convert_model but that would require to also do the necessary preparations we do inside the convert_model which we will not do for TorchScript module. This approach is not good.
Ideal solution would be to allow optimum-intel to set the flag when calling convert_model and this is what we want to do for 2025.0, but for 2024.6 unfortunately we only can use this patch.

IlyasMoutawwakil · 2024-12-17T10:49:19Z

okay, we can go with a patch but let's at least make it a clean one, the naming in the PR is a bit confusing and I don't think the patching should be in __main__.py but rather in convert where lower level logic is defined. What about this:

add an argument check_trace=None to the export function(s) so that users can control this behavior.
if None, set this value to False in the case of diffusers.
do the patching around the convert_model function with patch_torch_jit_trace(check_trace=check_trace).

mvafin added 2 commits December 11, 2024 10:59

Do not check trace for diffusers, saving memory and time

19328ee

Use contextmanager

e50f5f4

AlexKoff88 requested review from echarlaix and nikita-savelyevv December 12, 2024 06:50

AlexKoff88 approved these changes Dec 12, 2024

View reviewed changes

AlexKoff88 requested a review from IlyasMoutawwakil December 12, 2024 06:53

Fix style

40934f3

mvafin mentioned this pull request Dec 12, 2024

Reduce img size in dummy_inputs for FLUX #1070

Merged

3 tasks

nikita-savelyevv approved these changes Dec 12, 2024

View reviewed changes

IlyasMoutawwakil approved these changes Dec 13, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not check trace for diffusers, saving memory and time for FLUX #1064

Do not check trace for diffusers, saving memory and time for FLUX #1064

mvafin commented Dec 11, 2024

AlexKoff88 commented Dec 12, 2024

HuggingFaceDocBuilderDev commented Dec 12, 2024

mvafin commented Dec 12, 2024

AlexKoff88 commented Dec 12, 2024

IlyasMoutawwakil left a comment •

edited

Loading

AlexKoff88 commented Dec 13, 2024

mvafin commented Dec 13, 2024

IlyasMoutawwakil commented Dec 17, 2024

Do not check trace for diffusers, saving memory and time for FLUX #1064

Are you sure you want to change the base?

Do not check trace for diffusers, saving memory and time for FLUX #1064

Conversation

mvafin commented Dec 11, 2024

What does this PR do?

Before submitting

AlexKoff88 commented Dec 12, 2024

HuggingFaceDocBuilderDev commented Dec 12, 2024

mvafin commented Dec 12, 2024

AlexKoff88 commented Dec 12, 2024

IlyasMoutawwakil left a comment • edited Loading

Choose a reason for hiding this comment

AlexKoff88 commented Dec 13, 2024

mvafin commented Dec 13, 2024

IlyasMoutawwakil commented Dec 17, 2024

IlyasMoutawwakil left a comment •

edited

Loading