Emphasis fix-up code, commented as "restoring original mean is likely not correct" #12290

zwishenzug · 2022-10-05T14:19:32Z

zwishenzug
Oct 5, 2022

In the code which applies the emphasis in sd_hijack.py, there is a comment that says

# restoring original mean is likely not correct, but it seems to work well to prevent artifacts that happen otherwise

This relates to some code which does this:

z *= original_mean / new_mean

Is there any principled reason why this method of adjusting the embeddings was chosen?

Might it be more principled to do a normalization? Subtract the mean(), and divide by the std() to restore it to a normal/gaussian?

That is a much more common way of doing this kind of fixup, and if we expect the embeddings to follow a normal distribution, it would make sense.

zwishenzug · 2022-10-05T14:25:40Z

zwishenzug
Oct 5, 2022
Author

For example, instead of this:

    original_mean = z.mean()
    z *= batch_multipliers.reshape(batch_multipliers.shape + (1,)).expand(z.shape)
    new_mean = z.mean()
    z *= original_mean / new_mean

What about something more like:

    z *= batch_multipliers.reshape(batch_multipliers.shape + (1,)).expand(z.shape)
    z -= z.mean()
    z /= z.std()

or perhaps something more like this to restore the original mean and std?:

    original_mean = z.mean()
    original_std = z.std()
    z *= batch_multipliers.reshape(batch_multipliers.shape + (1,)).expand(z.shape)
    z -= z.mean() - original_mean 
    z /= z.std() / original_std

0 replies

zwishenzug · 2022-10-05T16:11:42Z

zwishenzug
Oct 5, 2022
Author

After doing a fair bit of testing, I can see that my first idea, to do a full normalization is no good. Although z is close to gaussian, it's always a bit off, for example the mean always seems to be more like -0.1 than zero, and the difference is significant. Forcing a full normalization changed the images too much.

My second idea seemed better, and a formulation like this seemed to correctly restore both the original mean and std:

original_mean = z.mean()
original_std = z.std()
z *= batch_multipliers.reshape(batch_multipliers.shape + (1,)).expand(z.shape)
z *= original_std / z.std()
z+= original_mean - z.mean()

However, the differences were very minor and subjective. Although it feels more principled to adjust the mean with addition/subtraction, and to also take the std into account, the actual results in my limited testing didn't seem to show any significant improvement. Perhaps there were a few less artifacts but it was very difficult to be certain, the difference was so minor.

It would be good to get some input into this though, I still think it would be good to try and find a better way to do this adjustment.

It may be that this kind of emphasis adjustment should be happening elsewhere in the process, perhaps after the tokens have been converted to embeddings but before they have been through the transformer (like the way the textual inversion hooks in at that point)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Emphasis fix-up code, commented as "restoring original mean is likely not correct" #12290

{{title}}

Replies: 2 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Emphasis fix-up code, commented as "restoring original mean is likely not correct" #12290

zwishenzug Oct 5, 2022

Replies: 2 comments

zwishenzug Oct 5, 2022 Author

zwishenzug Oct 5, 2022 Author

zwishenzug
Oct 5, 2022

zwishenzug
Oct 5, 2022
Author

zwishenzug
Oct 5, 2022
Author