You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
KTO is a training method like DPO, originally implemented for LLMs in https://github.com/ContextualAI/HALOs
It can take arbitrary good or bad samples as input, rather than the pairwise ones that required by DPO.
This might be more user-friendly if can be porting to SD traning than DPO (https://github.com/huggingface/diffusers/tree/main/examples/research_projects/diffusion_dpo) I wonder?
Imagine collecting a bunch of pictures I like and a bunch of pictures I dislike, throw them into KTO and get a LoRA that reflects personal preference?
Beta Was this translation helpful? Give feedback.
All reactions