Fix race condition in preview code. #6124

AustinMroz · 2024-12-19T21:24:32Z

In the current preview code, when possible, a non-blocking to operation is performed and, immediately after, the output tensor is used to create an image. If this non-blocking operation has not completed, PIL makes a copy of the uninitialized memory to produce an image. Generally, this will either contain zeros, or the result of a previously generated preview (Sometimes from an entirely different execution). This results in both incorrect output, and wasted computation (unless the memory this output was eventually copied to is reallocated and displayed instead of a future preview).

To resolve this, the state of the preview generation is tracked with an event.

The PIL image is created with no copy
The preview image is not sent to from the server until ready
Completion of this event is polled with a reasonably slow frequency
A new preview is not created if a previous preview has not completed

On my system (linux, 2.5.0.dev20240805+rocm6.1, 7900 GRE), I'm seeing no difference in the execution time for latent2rgb and minute performance improvement (~5%) for taesd, but note that erroneous previews were more commonly produced when latent2rgb was used.

EDIT: Fix width and height being swapped in frombuffer call. Minor memory optimization by eliminating tensor concatenation.

In the previous preview code, when possible, a non-blocking `to` operation is performed and, immediately after, the output tensor is used to create an image. If this non-blocking operation has not completed, PIL makes a copy of the uninitialized memory to produce an image. Generally, this will either contain zeros, or the result of a previously generated preview. This results in both incorrect output, and wasted computation (unless the memory this output was eventually copied to is reallocated and displayed instead of a future preview). To resolve this, the state of the preview generation is tracked with an event. - The PIL image is created with no copy - The preview image is not sent to from the server until ready - Completion of this event is polled with a reasonably slow frequency - A new preview is not created if a previous preview has not completed

comfyanonymous · 2024-12-23T11:38:48Z

This breaks things on non cuda platforms like mps.

AustinMroz · 2024-12-23T23:08:51Z

Much appreciated. Is device_supports_non_blocking sufficient for determining if a devices support events as so?

comfyanonymous · 2024-12-24T11:31:02Z

There seems to be torch.xpu.event and torch.mps.event specifically for those platforms.

AustinMroz requested a review from comfyanonymous as a code owner December 19, 2024 21:24

AustinMroz force-pushed the master branch from a6ed5b9 to 813b8df Compare December 21, 2024 10:15

Only use events for devices supporting nonblocking

ce5afec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix race condition in preview code. #6124

Fix race condition in preview code. #6124

AustinMroz commented Dec 19, 2024 •

edited

Loading

comfyanonymous commented Dec 23, 2024

AustinMroz commented Dec 23, 2024

comfyanonymous commented Dec 24, 2024

Fix race condition in preview code. #6124

Are you sure you want to change the base?

Fix race condition in preview code. #6124

Conversation

AustinMroz commented Dec 19, 2024 • edited Loading

comfyanonymous commented Dec 23, 2024

AustinMroz commented Dec 23, 2024

comfyanonymous commented Dec 24, 2024

AustinMroz commented Dec 19, 2024 •

edited

Loading