Should we normalize the image in train_text_to_image_sdxl.py and related scripts #6580

linjiapro · 2024-01-15T18:43:43Z

linjiapro
Jan 15, 2024

In Diffusers' training script such as train_text_to_image_sdxl.py, when an image is transferred to tensor and get passed into the image encoder, there is a step of normalization:
train_transforms = transforms.Compose([transforms.ToTensor(), transforms.Normalize([0.5], [0.5])])

I highly doubt the transforms.Normalize should be there for 2 reasons:

At first, I cannot find this normalization logic in stability AI's repo:
https://github.com/Stability-AI/generative-models

Secondly, the image encoder/decoder system cannot reconstruct the image in a perfect way if the incoming tensor data gets normalized. Cause the information of mean/std of the original image will get lost....

If we do this normalization in the training script, eventually, the model will start to output images that will have an average of 0.5 and std of 0.5. I think this is a restriction that should not be there, right?

Answered by linjiapro

Jan 17, 2024

My bad. I just checked the torch normalization API definition, it does this:

x = (x-mean)/std

So mean = 0.5, and std = 0.5 makes lots of sense.

I thought the API would change the distribution of the pixel values to force the mean of the values to be 0.5, and the variance to be 0.5. That is why I filed this discussion.

Thanks for the clarification.

View full answer

sayakpaul · 2024-01-16T04:33:07Z

sayakpaul
Jan 16, 2024
Maintainer

The inputs to the VAE encoder are expected to be in the [-1, 1] range, and that is what the community is following. Can you show an example of the reconstruction process being affected for this?

0 replies

linjiapro · 2024-01-17T03:53:18Z

linjiapro
Jan 17, 2024
Author

My bad. I just checked the torch normalization API definition, it does this:

x = (x-mean)/std

So mean = 0.5, and std = 0.5 makes lots of sense.

I thought the API would change the distribution of the pixel values to force the mean of the values to be 0.5, and the variance to be 0.5. That is why I filed this discussion.

Thanks for the clarification.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should we normalize the image in train_text_to_image_sdxl.py and related scripts #6580

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Should we normalize the image in train_text_to_image_sdxl.py and related scripts #6580

linjiapro Jan 15, 2024

Replies: 2 comments

sayakpaul Jan 16, 2024 Maintainer

linjiapro Jan 17, 2024 Author

linjiapro
Jan 15, 2024

sayakpaul
Jan 16, 2024
Maintainer

linjiapro
Jan 17, 2024
Author