Emphasis fix-up code, commented as "restoring original mean is likely not correct" #12290
Replies: 2 comments
-
For example, instead of this:
What about something more like:
or perhaps something more like this to restore the original mean and std?:
|
Beta Was this translation helpful? Give feedback.
-
After doing a fair bit of testing, I can see that my first idea, to do a full normalization is no good. Although z is close to gaussian, it's always a bit off, for example the mean always seems to be more like -0.1 than zero, and the difference is significant. Forcing a full normalization changed the images too much. My second idea seemed better, and a formulation like this seemed to correctly restore both the original mean and std:
However, the differences were very minor and subjective. Although it feels more principled to adjust the mean with addition/subtraction, and to also take the std into account, the actual results in my limited testing didn't seem to show any significant improvement. Perhaps there were a few less artifacts but it was very difficult to be certain, the difference was so minor. It would be good to get some input into this though, I still think it would be good to try and find a better way to do this adjustment. It may be that this kind of emphasis adjustment should be happening elsewhere in the process, perhaps after the tokens have been converted to embeddings but before they have been through the transformer (like the way the textual inversion hooks in at that point) |
Beta Was this translation helpful? Give feedback.
-
In the code which applies the emphasis in sd_hijack.py, there is a comment that says
This relates to some code which does this:
Is there any principled reason why this method of adjusting the embeddings was chosen?
Might it be more principled to do a normalization? Subtract the mean(), and divide by the std() to restore it to a normal/gaussian?
That is a much more common way of doing this kind of fixup, and if we expect the embeddings to follow a normal distribution, it would make sense.
Beta Was this translation helpful? Give feedback.
All reactions