Multiple bugs in flux dreambooth script: train_dreambooth_lora_flux_advanced.py #10313

freckletonj · 2024-12-20T06:01:34Z

I haven't fully fixed the script, but I'm really not sure how anyone has had success with it (eg the blog post). Many show stoppers when trying to get Textual Inversion working.

To debug, I started peeling away at the main Flux training script, and have fixed all the following bugs, BUT it still doesn't train well. With any hyperparams, it fails to learn a new concept, even to overfit on it, and quickly degrades. So the following are necessary to be fixed, but not sufficient to get this working.

On a 1-node-2-gpu setup, accelerator wraps everything in a DistributedSomething class, so all the calls to model.dtype/model.config etc error out, since they're wrapped, and you actually need something like model.module.dtype
it requires an --instance-prompt even if you're using a dataset with a custom text column
The updated tokenizers are not getting passed into the pipeline
text_embeddings get saved weird, it looks like a single copy gets saved no matter the checkpoint number?
in log_validation there's a dtype mismatch runtime error when training in fp16/bf16, and you need to autocast
also in log_validation the Generator doesn't seem to do anything. Every generation gets performed on a different seed so you can't see the evolution/performance working on the same samples. This is remedied if you save the RNG state of torch/torch_cuda/random, then use the RNG Generator, then reset the random state afterward
t5 training doesn't work. On this line it unfreezes token_embedding which doesn't exist in t5. shared needs to be unfrozen.
Just a weird one, pulling the embeddings toward std**0.1 makes intuitive sense, but this kind of thing should definitely mention justification. Was this done in a paper somewhere?

Some nice to haves that I wish this did out of the box (but would be happy if it the simple path above "just worked"):

save latent cache to disk
aspect ratio bucketing
8bit backbone

Honestly, at this point i'm tempted to give up on diffusers and use something else for this current client, after sweating over fixing the above, i still haven't gotten it to work, and still haven't been able to stress test things like the dreambooth functionality, or the whole-text-encoder finetuning feature.

The text was updated successfully, but these errors were encountered:

bonlime · 2024-12-20T14:49:38Z

Half of your links point to SDXL training pipeline, and half to flux training pipeline, is this intended?

hlky · 2024-12-20T15:24:03Z

cc @linoytsaban @sayakpaul for training

freckletonj · 2024-12-20T20:29:34Z

@bonlime gah! I was referring between the 2 since the SDXL script does some things right that the flux script doesn't, so accidentally copied the wrong link. I've updated them thanks!

all problems still stand, and I'm happy to help out if someone wants to scour this with me and get it working

hlky added the training label Dec 20, 2024

freckletonj changed the title ~~Multiple bugs in flux dreambooth script: train_dreambooth_lora_sdxl_advanced.py~~ Multiple bugs in flux dreambooth script: train_dreambooth_lora_flux_advanced.py Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple bugs in flux dreambooth script: train_dreambooth_lora_flux_advanced.py #10313

Multiple bugs in flux dreambooth script: train_dreambooth_lora_flux_advanced.py #10313

freckletonj commented Dec 20, 2024 •

edited

Loading

bonlime commented Dec 20, 2024

hlky commented Dec 20, 2024

freckletonj commented Dec 20, 2024 •

edited

Loading

Multiple bugs in flux dreambooth script: train_dreambooth_lora_flux_advanced.py #10313

Multiple bugs in flux dreambooth script: train_dreambooth_lora_flux_advanced.py #10313

Comments

freckletonj commented Dec 20, 2024 • edited Loading

bonlime commented Dec 20, 2024

hlky commented Dec 20, 2024

freckletonj commented Dec 20, 2024 • edited Loading

freckletonj commented Dec 20, 2024 •

edited

Loading

freckletonj commented Dec 20, 2024 •

edited

Loading