-
In Diffusers' training script such as train_text_to_image_sdxl.py, when an image is transferred to tensor and get passed into the image encoder, there is a step of normalization: I highly doubt the transforms.Normalize should be there for 2 reasons: At first, I cannot find this normalization logic in stability AI's repo: Secondly, the image encoder/decoder system cannot reconstruct the image in a perfect way if the incoming tensor data gets normalized. Cause the information of mean/std of the original image will get lost.... If we do this normalization in the training script, eventually, the model will start to output images that will have an average of 0.5 and std of 0.5. I think this is a restriction that should not be there, right? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
The inputs to the VAE encoder are expected to be in the [-1, 1] range, and that is what the community is following. Can you show an example of the reconstruction process being affected for this? |
Beta Was this translation helpful? Give feedback.
-
My bad. I just checked the torch normalization API definition, it does this: x = (x-mean)/std So mean = 0.5, and std = 0.5 makes lots of sense. I thought the API would change the distribution of the pixel values to force the mean of the values to be 0.5, and the variance to be 0.5. That is why I filed this discussion. Thanks for the clarification. |
Beta Was this translation helpful? Give feedback.
My bad. I just checked the torch normalization API definition, it does this:
x = (x-mean)/std
So mean = 0.5, and std = 0.5 makes lots of sense.
I thought the API would change the distribution of the pixel values to force the mean of the values to be 0.5, and the variance to be 0.5. That is why I filed this discussion.
Thanks for the clarification.