This guide provides a user-friendly breakdown of the command-line options available in SimpleTuner's train_sdxl.py
script. These options offer a high degree of customization, allowing you to train your model to suit your specific requirements.
- What: Path to the pretrained model or its identifier from huggingface.co/models.
- Why: To specify the base model you'll start training from.
- What: The name of the Huggingface Hub model and local results directory.
- Why: This value is used as the directory name under the location specified as
--output_dir
. If--push_to_hub
is provided, this will become the name of the model on Huggingface Hub.
- What: If provided, your model will be uploaded to Huggingface Hub once training completes.
- What: Folder containing the training data.
- Why: Designates where your training images and other data are stored.
- What: Specifies the data storage backend, either 'local' or 'aws'.
- Why: Allows for seamless switching between local and cloud storage.
- What: Input image resolution. Can be expressed as pixels, or megapixels.
- Why: All images in the dataset will have their smaller edge resized to this resolution for training. If you use 1024px, the images may become very large and use an excessive amount of VRAM. The best mileage tends to be a 768 or 800 pixel base resolution, although 512px resolution training can really pay off with SDXL in particular.
- What: This tells SimpleTuner whether to use
area
size calculations orpixel
edge calculations. - Why: SimpleTuner's default
pixel
behaviour is to resize the image, keeping the aspect ratio. Setting the type toarea
instead uses the given megapixel value as the target size for the image, keeping the aspect ratio.
- What: Output image resolution, measured in pixels.
- Why: All images generated during validation will be this resolution. Useful if the model is being trained with a different resolution.
- What: Strategy for deriving image captions. Choices:
textfile
,filename
- Why: Determines how captions are generated for training images.
textfile
will use the contents of a.txt
file with the same filename as the image, andfilename
will apply some cleanup to the filename before using it as the caption.
- What: Number of training epochs (the number of times that all images are seen)
- Why: Determines the number of image repeats, which impacts the duration of the training process. More epochs tends to result in overfitting, but might be required to pick up the concepts you wish to train in. A reasonable value might be from 5 to 50.
- What: Number of training steps to exit training after.
- Why: Useful for shortening the length of training.
- What: Batch size for the training data loader.
- Why: Affects the model's memory consumption, convergence quality, and training speed. The higher the batch size, the better the results will be, but a very high batch size might result in overfitting or destabilized training, as well as increasing the duration of the training session unnecessarily. Experimentation is warranted, but in general, you want to try to max out your video memory while not decreasing the training speed.
- What: Number of update steps to accumulate before performing a backward/update pass.
- Why: Useful for handling larger models or datasets.
- What: Initial learning rate after potential warmup.
- Why: The learning rate behaves as a sort of "step size" for gradient updates - too high, and we overstep the solution. Too low, and we never reach the ideal solution. A minimal value might be as low as
4e-7
and a maximal value would likely be as high as1e-5
. When a higher learning rate is used, it's advantageous to use an EMA network with a learning rate warmup - see--use_ema
,--lr_warmup_steps
, and--lr_scheduler
.
- What: How to scale the learning rate over time.
- Choices: constant, constant_with_warmup, cosine, cosine_with_restarts, polynomial, linear
- Why: Models benefit from continual learning rate adjustments to further explore the loss landscape. A cosine schedule is used as the default; this allows the training to smoothly transition between two extremes. If using a constant learning rate, it is common to select a too-high or too-low value, causing divergence (too high) or getting stuck in a local minima (too low).
- What: Utilising min-SNR weighted loss factor.
- Why: Minimum SNR gamma weights the loss factor of a timestep by its position in the schedule. Overly noisy timesteps have their contributions reduced, and less-noisy timesteps have it increased. Value recommended by the original paper is 5 but you can use values as low as 1 or as high as 20, typically seen as the maximum value - beyond a value of 20, the math does not change things much. A value of 1 is the strongest.
- What: Interval at which training state checkpoints are saved.
- Why: Useful for resuming training and for inference. Every n iterations, a partial checkpoint will be saved in the
.safetensors
format, via the Diffusers filesystem layout.
- What: Specifies if and from where to resume training.
- Why: Allows you to continue training from a saved state, either manually specified or the latest available. A checkpoint is composed of a
unet
and optionally, anema_unet
. Theunet
may be dropped into any Diffusers layout SDXL model, allowing it to be used as a normal model would.
- What: Directory for TensorBoard logs.
- Why: Allows you to monitor training progress and performance metrics.
- What: Specifies the platform for reporting results and logs.
- Why: Enables integration with platforms like TensorBoard, wandb, or comet_ml for monitoring.
This is a basic overview meant to help you get started. For a complete list of options and more detailed explanations, please refer to the full specification:
usage: train_sdxl.py [-h] [--snr_gamma SNR_GAMMA]
--pretrained_model_name_or_path
PRETRAINED_MODEL_NAME_OR_PATH
[--pretrained_vae_model_name_or_path PRETRAINED_VAE_MODEL_NAME_OR_PATH]
[--prediction_type {epsilon,v_prediction,sample}]
[--snr_weight SNR_WEIGHT]
[--training_scheduler_timestep_spacing {leading,linspace,trailing}]
[--inference_scheduler_timestep_spacing {leading,linspace,trailing}]
[--timestep_bias_strategy {earlier,later,range,none}]
[--timestep_bias_multiplier TIMESTEP_BIAS_MULTIPLIER]
[--timestep_bias_begin TIMESTEP_BIAS_BEGIN]
[--timestep_bias_end TIMESTEP_BIAS_END]
[--timestep_bias_portion TIMESTEP_BIAS_PORTION]
[--rescale_betas_zero_snr] [--vae_dtype VAE_DTYPE]
[--vae_batch_size VAE_BATCH_SIZE] [--keep_vae_loaded]
[--skip_file_discovery SKIP_FILE_DISCOVERY]
[--revision REVISION] --instance_data_dir
INSTANCE_DATA_DIR [--preserve_data_backend_cache]
[--cache_dir_text CACHE_DIR_TEXT]
[--cache_dir_vae CACHE_DIR_VAE]
[--data_backend {local,aws}]
[--write_batch_size WRITE_BATCH_SIZE]
[--aws_config_file AWS_CONFIG_FILE]
[--aws_bucket_name AWS_BUCKET_NAME]
[--aws_bucket_image_prefix AWS_BUCKET_IMAGE_PREFIX]
[--aws_endpoint_url AWS_ENDPOINT_URL]
[--aws_region_name AWS_REGION_NAME]
[--aws_access_key_id AWS_ACCESS_KEY_ID]
[--aws_secret_access_key AWS_SECRET_ACCESS_KEY]
[--cache_dir CACHE_DIR]
[--cache_clear_validation_prompts]
[--seen_state_path SEEN_STATE_PATH]
[--state_path STATE_PATH]
[--caption_strategy {filename,textfile,instance_prompt}]
[--instance_prompt INSTANCE_PROMPT]
[--output_dir OUTPUT_DIR] [--seed SEED]
[--seed_for_each_device SEED_FOR_EACH_DEVICE]
[--resolution RESOLUTION]
[--resolution_type {pixel,area}]
[--minimum_image_size MINIMUM_IMAGE_SIZE] [--crop]
[--train_text_encoder]
[--train_batch_size TRAIN_BATCH_SIZE]
[--num_train_epochs NUM_TRAIN_EPOCHS]
[--max_train_steps MAX_TRAIN_STEPS]
[--checkpointing_steps CHECKPOINTING_STEPS]
[--checkpoints_total_limit CHECKPOINTS_TOTAL_LIMIT]
[--resume_from_checkpoint RESUME_FROM_CHECKPOINT]
[--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS]
[--gradient_checkpointing]
[--learning_rate LEARNING_RATE] [--lr_scale]
[--lr_scheduler {linear,cosine,cosine_with_restarts,polynomial,constant,constant_with_warmup}]
[--lr_warmup_steps LR_WARMUP_STEPS]
[--lr_num_cycles LR_NUM_CYCLES] [--lr_power LR_POWER]
[--use_ema] [--ema_decay EMA_DECAY]
[--non_ema_revision NON_EMA_REVISION]
[--offload_param_path OFFLOAD_PARAM_PATH]
[--use_8bit_adam] [--use_adafactor_optimizer]
[--use_dadapt_optimizer]
[--dadaptation_learning_rate DADAPTATION_LEARNING_RATE]
[--adam_beta1 ADAM_BETA1] [--adam_beta2 ADAM_BETA2]
[--adam_weight_decay ADAM_WEIGHT_DECAY]
[--adam_epsilon ADAM_EPSILON]
[--max_grad_norm MAX_GRAD_NORM] [--push_to_hub]
[--hub_token HUB_TOKEN] [--hub_model_id HUB_MODEL_ID]
[--logging_dir LOGGING_DIR]
[--validation_torch_compile VALIDATION_TORCH_COMPILE]
[--validation_torch_compile_mode {reduce-overhead,default}]
[--allow_tf32] [--report_to REPORT_TO]
[--tracker_run_name TRACKER_RUN_NAME]
[--tracker_project_name TRACKER_PROJECT_NAME]
[--validation_prompt VALIDATION_PROMPT]
[--validation_prompt_library]
[--user_prompt_library USER_PROMPT_LIBRARY]
[--num_validation_images NUM_VALIDATION_IMAGES]
[--validation_steps VALIDATION_STEPS]
[--validation_num_inference_steps VALIDATION_NUM_INFERENCE_STEPS]
[--validation_resolution VALIDATION_RESOLUTION]
[--validation_noise_scheduler {ddim,ddpm,euler,euler-a,unipc}]
[--disable_compel] [--enable_watermark]
[--mixed_precision {no,fp16,bf16}]
[--local_rank LOCAL_RANK]
[--enable_xformers_memory_efficient_attention]
[--set_grads_to_none] [--noise_offset NOISE_OFFSET]
[--noise_offset_probability NOISE_OFFSET_PROBABILITY]
[--validation_guidance VALIDATION_GUIDANCE]
[--validation_guidance_rescale VALIDATION_GUIDANCE_RESCALE]
[--validation_randomize]
[--validation_seed VALIDATION_SEED]
[--fully_unload_text_encoder]
[--freeze_encoder_before FREEZE_ENCODER_BEFORE]
[--freeze_encoder_after FREEZE_ENCODER_AFTER]
[--freeze_encoder_strategy FREEZE_ENCODER_STRATEGY]
[--print_filenames]
[--metadata_update_interval METADATA_UPDATE_INTERVAL]
[--debug_aspect_buckets] [--debug_dataset_loader]
[--freeze_encoder]
[--text_encoder_limit TEXT_ENCODER_LIMIT]
[--prepend_instance_prompt] [--only_instance_prompt]
[--conditioning_dropout_probability CONDITIONING_DROPOUT_PROBABILITY]
[--caption_dropout_probability CAPTION_DROPOUT_PROBABILITY]
[--input_perturbation INPUT_PERTURBATION]
[--input_perturbation_probability INPUT_PERTURBATION_PROBABILITY]
[--delete_unwanted_images] [--delete_problematic_images]
[--offset_noise] [--lr_end LR_END]
The following SimpleTuner command-line options are available:
options:
-h, --help show this help message and exit
--snr_gamma SNR_GAMMA
SNR weighting gamma to be used if rebalancing the
loss. Recommended value is 5.0. More details here:
https://arxiv.org/abs/2303.09556.
--pretrained_model_name_or_path PRETRAINED_MODEL_NAME_OR_PATH
Path to pretrained model or model identifier from
huggingface.co/models.
--pretrained_vae_model_name_or_path PRETRAINED_VAE_MODEL_NAME_OR_PATH
Path to an improved VAE to stabilize training. For
more details check out:
https://github.com/huggingface/diffusers/pull/4038.
--prediction_type {epsilon,v_prediction,sample}
The type of prediction to use for the u-net. Choose
between ['epsilon', 'v_prediction', 'sample']. For SD
2.1-v, this is v_prediction. For 2.1-base, it is
epsilon. SDXL is generally epsilon. SD 1.5 is epsilon.
--snr_weight SNR_WEIGHT
When training a model using
`--prediction_type=sample`, one can supply an SNR
weight value to augment the loss with. If a value of
0.5 is provided here, the loss is taken half from the
SNR and half from the MSE.
--training_scheduler_timestep_spacing {leading,linspace,trailing}
(SDXL Only) Spacing timesteps can fundamentally alter
the course of history. Er, I mean, your model weights.
For all training, including epsilon, it would seem
that 'trailing' is the right choice. SD 2.x always
uses 'trailing', but SDXL may do better in its default
state when using 'leading'.
--inference_scheduler_timestep_spacing {leading,linspace,trailing}
(SDXL Only) The Bytedance paper on zero terminal SNR
recommends inference using 'trailing'. SD 2.x always
uses 'trailing', but SDXL may do better in its default
state when using 'leading'.
--timestep_bias_strategy {earlier,later,range,none}
The timestep bias strategy, which may help direct the
model toward learning low or frequency details.
Choices: ['earlier', 'later', 'none']. The default is
'none', which means no bias is applied, and training
proceeds normally. The value of 'later' will prefer to
generate samples for later timesteps.
--timestep_bias_multiplier TIMESTEP_BIAS_MULTIPLIER
The multiplier for the bias. Defaults to 1.0, which
means no bias is applied. A value of 2.0 will double
the weight of the bias, and a value of 0.5 will halve
it.
--timestep_bias_begin TIMESTEP_BIAS_BEGIN
When using `--timestep_bias_strategy=range`, the
beginning timestep to bias. Defaults to zero, which
equates to having no specific bias.
--timestep_bias_end TIMESTEP_BIAS_END
When using `--timestep_bias_strategy=range`, the final
timestep to bias. Defaults to 1000, which is the
number of timesteps that SDXL Base and SD 2.x were
trained on.
--timestep_bias_portion TIMESTEP_BIAS_PORTION
The portion of timesteps to bias. Defaults to 0.25,
which 25 percent of timesteps will be biased. A value
of 0.5 will bias one half of the timesteps. The value
provided for `--timestep_bias_strategy` determines
whether the biased portions are in the earlier or
later timesteps.
--rescale_betas_zero_snr
If set, will rescale the betas to zero terminal SNR.
This is recommended for training with v_prediction.
For epsilon, this might help with fine details, but
will not result in contrast improvements.
--vae_dtype VAE_DTYPE
The dtype of the VAE model. Choose between ['default',
'fp16', 'fp32', 'bf16'].The default VAE dtype is
float32, due to NaN issues in SDXL 1.0.
--vae_batch_size VAE_BATCH_SIZE
When pre-caching latent vectors, this is the batch
size to use. Decreasing this may help with VRAM
issues, but if you are at that point of contention,
it's possible that your GPU has too little RAM.
Default: 4.
--keep_vae_loaded If set, will keep the VAE loaded in memory. This can
reduce disk churn, but consumes VRAM during the
forward pass.
--skip_file_discovery SKIP_FILE_DISCOVERY
Comma-separated values of which stages to skip
discovery for. Skipping any stage will speed up
resumption, but will increase the risk of errors, as
missing images or incorrectly bucketed images may not
be caught. 'vae' will skip the VAE cache process,
'aspect' will not build any aspect buckets, and 'text'
will avoid text embed management. Valid options:
aspect, vae, text, metadata.
--revision REVISION Revision of pretrained model identifier from
huggingface.co/models. Trainable model components
should be float32 precision.
--instance_data_dir INSTANCE_DATA_DIR
A folder containing the training data. Folder contents
must either follow the structure described in the
SimpleTuner documentation
(https://github.com/bghira/SimpleTuner), or the
structure described in https://huggingface.co/docs/dat
asets/image_dataset#imagefolder. For 🤗 Datasets in
particular, a `metadata.jsonl` file must exist to
provide the captions for the images. For SimpleTuner
layout, the images can be in subfolders. No particular
config is required.
--preserve_data_backend_cache
For very large cloud storage buckets that will never
change, enabling this option will prevent the trainer
from scanning it at startup, by preserving the cache
files that we generate. Be careful when using this,
as, switching datasets can result in the preserved
cache being used, which would be problematic.
Currently, cache is not stored in the dataset itself
but rather, locally. This may change in a future
release.
--cache_dir_text CACHE_DIR_TEXT
This is the path to a local directory that will
contain your text embed cache.
--cache_dir_vae CACHE_DIR_VAE
This is the path to a local directory that will
contain your VAE outputs. Unlike the text embed cache,
your VAE latents will be stored in the AWS data
backend. If the AWS backend is in use, this will be a
prefix for the bucket's VAE cache entries.
--data_backend {local,aws}
The data backend to use. Choose between ['local',
'aws']. Default: local. If using AWS, you must set the
AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
environment variables.
--write_batch_size WRITE_BATCH_SIZE
When using certain storage backends, it is better to
batch smaller writes rather than continuous
dispatching. In SimpleTuner, write batching is
currently applied during VAE caching, when many small
objects are written. This mostly applies to S3, but
some shared server filesystems may benefit as well,
eg. Ceph. Default: 64.
--aws_config_file AWS_CONFIG_FILE
Path to the AWS configuration file in JSON format.
Config key names are the same as SimpleTuner option
counterparts.
--aws_bucket_name AWS_BUCKET_NAME
The AWS bucket name to use.
--aws_bucket_image_prefix AWS_BUCKET_IMAGE_PREFIX
Instead of using --instance_data_dir, AWS S3 relies on
aws_bucket_*_prefix parameters. When provided, this
parameter will be prepended to the image path.
--aws_endpoint_url AWS_ENDPOINT_URL
The AWS server to use. If not specified, will use the
default server for the region specified. For Wasabi,
use https://s3.wasabisys.com.
--aws_region_name AWS_REGION_NAME
The AWS region to use. If not specified, will use the
default region for the server specified. For example,
if you specify 's3.amazonaws.com', the default region
will be 'us-east-1'.
--aws_access_key_id AWS_ACCESS_KEY_ID
The AWS access key ID.
--aws_secret_access_key AWS_SECRET_ACCESS_KEY
The AWS secret access key.
--cache_dir CACHE_DIR
The directory where the downloaded models and datasets
will be stored.
--cache_clear_validation_prompts
When provided, any validation prompt entries in the
text embed cache will be recreated. This is useful if
you've modified any of the existing prompts, or,
disabled/enabled Compel, via `--disable_compel`
--seen_state_path SEEN_STATE_PATH
Where the JSON document containing the state of the
seen images is stored. This helps ensure we do not
repeat images too many times.
--state_path STATE_PATH
A JSON document containing the current state of
training, will be placed here.
--caption_strategy {filename,textfile,instance_prompt}
The default captioning strategy, 'filename', will use
the filename as the caption, after stripping some
characters like underscores.The 'textfile' strategy
will use the contents of a text file with the same
name as the image.
--instance_prompt INSTANCE_PROMPT
This is unused. Filenames will be the captions
instead.
--output_dir OUTPUT_DIR
The output directory where the model predictions and
checkpoints will be written.
--seed SEED A seed for reproducible training.
--seed_for_each_device SEED_FOR_EACH_DEVICE
By default, a unique seed will be used for each GPU.
This is done deterministically, so that each GPU will
receive the same seed across invocations. If
--seed_for_each_device=false is provided, then we will
use the same seed across all GPUs, which will almost
certainly result in the over-sampling of inputs on
larger datasets.
--resolution RESOLUTION
The resolution for input images, all the images in the
train/validation dataset will be resized to this
resolution. If using --resolution_type=area, this
float value represents megapixels.
--resolution_type {pixel,area}
Resizing images maintains aspect ratio. This defines
the resizing strategy. If 'pixel', the images will be
resized to the resolution by pixel edge. If 'area',
the images will be resized so the pixel area is this
many megapixels.
--minimum_image_size MINIMUM_IMAGE_SIZE
The minimum resolution for both sides of input images.
If --delete_unwanted_images is set, images smaller
than this will be DELETED.
--crop Whether to center crop the input images to the
resolution. If not set, the images will be randomly
cropped. The images will be resized to the resolution
first before cropping. If training SDXL, the VAE cache
and aspect bucket cache will need to be (re)built so
they include crop coordinates.
--train_text_encoder (SD 2.x only) Whether to train the text encoder. If
set, the text encoder should be float32 precision.
--train_batch_size TRAIN_BATCH_SIZE
Batch size (per device) for the training dataloader.
--num_train_epochs NUM_TRAIN_EPOCHS
--max_train_steps MAX_TRAIN_STEPS
Total number of training steps to perform. If
provided, overrides num_train_epochs.
--checkpointing_steps CHECKPOINTING_STEPS
Save a checkpoint of the training state every X
updates. Checkpoints can be used for resuming training
via `--resume_from_checkpoint`. In the case that the
checkpoint is better than the final trained model, the
checkpoint can also be used for inference.Using a
checkpoint for inference requires separate loading of
the original pipeline and the individual checkpointed
model components.See https://huggingface.co/docs/diffu
sers/main/en/training/dreambooth#performing-inference-
using-a-saved-checkpoint for step by stepinstructions.
--checkpoints_total_limit CHECKPOINTS_TOTAL_LIMIT
Max number of checkpoints to store.
--resume_from_checkpoint RESUME_FROM_CHECKPOINT
Whether training should be resumed from a previous
checkpoint. Use a path saved by
`--checkpointing_steps`, or `"latest"` to
automatically select the last available checkpoint.
--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS
Number of updates steps to accumulate before
performing a backward/update pass.
--gradient_checkpointing
Whether or not to use gradient checkpointing to save
memory at the expense of slower backward pass.
--learning_rate LEARNING_RATE
Initial learning rate (after the potential warmup
period) to use.
--lr_scale Scale the learning rate by the number of GPUs,
gradient accumulation steps, and batch size.
--lr_scheduler {linear,cosine,cosine_with_restarts,polynomial,constant,constant_with_warmup}
The scheduler type to use. Default: cosine
--lr_warmup_steps LR_WARMUP_STEPS
Number of steps for the warmup in the lr scheduler.
--lr_num_cycles LR_NUM_CYCLES
Number of hard resets of the lr in
cosine_with_restarts scheduler.
--lr_power LR_POWER Power factor of the polynomial scheduler.
--use_ema Whether to use EMA (exponential moving average) model.
--ema_decay EMA_DECAY
The closer to 0.9999 this gets, the less updates will
occur over time. Setting it to a lower value, such as
0.990, will allow greater influence of later updates.
--non_ema_revision NON_EMA_REVISION
Revision of pretrained non-ema model identifier. Must
be a branch, tag or git identifier of the local or
remote repository specified with
--pretrained_model_name_or_path.
--offload_param_path OFFLOAD_PARAM_PATH
When using DeepSpeed ZeRo stage 2 or 3 with NVMe
offload, this may be specified to provide a path for
the offload.
--use_8bit_adam Whether or not to use 8-bit Adam from bitsandbytes.
--use_adafactor_optimizer
Whether or not to use the Adafactor optimizer.
--use_dadapt_optimizer
Whether or not to use the discriminator adaptation
optimizer.
--dadaptation_learning_rate DADAPTATION_LEARNING_RATE
Learning rate for the discriminator adaptation.
Default: 1.0
--adam_beta1 ADAM_BETA1
The beta1 parameter for the Adam optimizer.
--adam_beta2 ADAM_BETA2
The beta2 parameter for the Adam optimizer.
--adam_weight_decay ADAM_WEIGHT_DECAY
Weight decay to use.
--adam_epsilon ADAM_EPSILON
Epsilon value for the Adam optimizer
--max_grad_norm MAX_GRAD_NORM
Max gradient norm.
--push_to_hub Whether or not to push the model to the Hub.
--hub_token HUB_TOKEN
The token to use to push to the Model Hub.
--hub_model_id HUB_MODEL_ID
The name of the repository to keep in sync with the
local `output_dir`.
--logging_dir LOGGING_DIR
[TensorBoard](https://www.tensorflow.org/tensorboard)
log directory. Will default to
*output_dir/runs/**CURRENT_DATETIME_HOSTNAME***.
--validation_torch_compile VALIDATION_TORCH_COMPILE
Supply `--validation_torch_compile=true` to enable the
use of torch.compile() on the validation pipeline. For
some setups, torch.compile() may error out. This is
dependent on PyTorch version, phase of the moon, but
if it works, you should leave it enabled for a great
speed-up.
--validation_torch_compile_mode {reduce-overhead,default}
PyTorch provides different modes for the Torch
Inductor when compiling graphs. reduce-overhead, the
default mode, provides the most benefit.
--allow_tf32 Whether or not to allow TF32 on Ampere GPUs. Can be
used to speed up training. For more information, see h
ttps://pytorch.org/docs/stable/notes/cuda.html#tensorf
loat-32-tf32-on-ampere-devices
--report_to REPORT_TO
The integration to report the results and logs to.
Supported platforms are `"tensorboard"` (default),
`"wandb"` and `"comet_ml"`. Use `"all"` to report to
all integrations.
--tracker_run_name TRACKER_RUN_NAME
The name of the run to track with the tracker.
--tracker_project_name TRACKER_PROJECT_NAME
The name of the project for WandB or Tensorboard.
--validation_prompt VALIDATION_PROMPT
A prompt that is used during validation to verify that
the model is learning.
--validation_prompt_library
If this is provided, the SimpleTuner prompt library
will be used to generate multiple images.
--user_prompt_library USER_PROMPT_LIBRARY
This should be a path to the JSON file containing your
prompt library. See user_prompt_library.json.example.
--num_validation_images NUM_VALIDATION_IMAGES
Number of images that should be generated during
validation with `validation_prompt`.
--validation_steps VALIDATION_STEPS
Run validation every X steps. Validation consists of
running the prompt `args.validation_prompt` multiple
times: `args.num_validation_images` and logging the
images.
--validation_num_inference_steps VALIDATION_NUM_INFERENCE_STEPS
The default scheduler, DDIM, benefits from more steps.
UniPC can do well with just 10-15. For more speed
during validations, reduce this value. For better
quality, increase it. For model distilation, you will
likely want to keep this low.
--validation_resolution VALIDATION_RESOLUTION
Square resolution images will be output at this
resolution (256x256).
--validation_noise_scheduler {ddim,ddpm,euler,euler-a,unipc}
When validating the model at inference time, a
different scheduler may be chosen. UniPC can offer
better speed, and Euler A can put up with
instabilities a bit better. For zero-terminal SNR
models, DDIM is the best choice. Choices: ['ddim',
'ddpm', 'euler', 'euler-a', 'unipc'], Default: ddim
--disable_compel If provided, prompts will be handled using the typical
prompt encoding strategy. Otherwise, the default
behaviour is to use Compel for prompt embed
generation.
--enable_watermark The SDXL 0.9 and 1.0 licenses both require a watermark
be used to identify any images created to be shared.
Since the images created during validation typically
are not shared, and we want the most accurate results,
this watermarker is disabled by default. If you are
sharing the validation images, it is up to you to
ensure that you are complying with the license,
whether that is through this watermarker, or another.
--mixed_precision {no,fp16,bf16}
Whether to use mixed precision. Choose between fp16
and bf16 (bfloat16). Bf16 requires PyTorch >= 1.10.and
an Nvidia Ampere GPU. Default to the value of
accelerate config of the current system or the flag
passed with the `accelerate.launch` command. Use this
argument to override the accelerate config.
--local_rank LOCAL_RANK
For distributed training: local_rank
--enable_xformers_memory_efficient_attention
Whether or not to use xformers.
--set_grads_to_none Save more memory by using setting grads to None
instead of zero. Be aware, that this changes certain
behaviors, so disable this argument if it causes any
problems. More info: https://pytorch.org/docs/stable/g
enerated/torch.optim.Optimizer.zero_grad.html
--noise_offset NOISE_OFFSET
The scale of noise offset. Default: 0.1
--noise_offset_probability NOISE_OFFSET_PROBABILITY
When training with --offset_noise, the value of
--noise_offset will only be applied probabilistically.
The default behaviour is for offset noise (if enabled)
to be applied 25 percent of the time.
--validation_guidance VALIDATION_GUIDANCE
CFG value for validation images. Default: 7.5
--validation_guidance_rescale VALIDATION_GUIDANCE_RESCALE
CFG rescale value for validation images. Default: 0.0,
max 1.0
--validation_randomize
If supplied, validations will be random, ignoring any
seeds.
--validation_seed VALIDATION_SEED
If not supplied, the value for --seed will be used. If
neither those nor --validation_randomize are supplied,
a seed of zero is used.
--fully_unload_text_encoder
If set, will fully unload the text_encoder from memory
when not in use. This currently has the side effect of
crashing validations, but it is useful for initiating
VAE caching on GPUs that would otherwise be too small.
--freeze_encoder_before FREEZE_ENCODER_BEFORE
When using 'before' strategy, we will freeze layers
earlier than this.
--freeze_encoder_after FREEZE_ENCODER_AFTER
When using 'after' strategy, we will freeze layers
later than this.
--freeze_encoder_strategy FREEZE_ENCODER_STRATEGY
When freezing the text_encoder, we can use the
'before', 'between', or 'after' strategy. The
'between' strategy will freeze layers between those
two values, leaving the outer layers unfrozen. The
default strategy is to freeze all layers from 17 up.
This can be helpful when fine-tuning Stable Diffusion
2.1 on a new style.
--print_filenames If any image files are stopping the process eg. due to
corruption or truncation, this will help identify
which is at fault.
--metadata_update_interval METADATA_UPDATE_INTERVAL
When generating the aspect bucket indicies, we want to
save it every X seconds. The default is to save it
every 1 hour, such that progress is not lost on
clusters where runtime is limited to 6-hour increments
(e.g. the JUWELS Supercomputer). The minimum value is
60 seconds.
--debug_aspect_buckets
If set, will print excessive debugging for aspect
bucket operations.
--debug_dataset_loader
If set, will print excessive debugging for data loader
operations.
--freeze_encoder Whether or not to freeze the text_encoder. The default
is true.
--text_encoder_limit TEXT_ENCODER_LIMIT
When training the text_encoder, we want to limit how
long it trains for to avoid catastrophic loss.
--prepend_instance_prompt
When determining the captions from the filename,
prepend the instance prompt as an enforced keyword.
--only_instance_prompt
Use the instance prompt instead of the caption from
filename.
--conditioning_dropout_probability CONDITIONING_DROPOUT_PROBABILITY
Conditioning dropout probability. Experimental. See
section 3.2.1 in the paper:
https://arxiv.org/abs/2211.09800.
--caption_dropout_probability CAPTION_DROPOUT_PROBABILITY
Caption dropout probability.
--input_perturbation INPUT_PERTURBATION
The scale of input pretubation. Recommended 0.1.
--input_perturbation_probability INPUT_PERTURBATION_PROBABILITY
While input perturbation can help with training
convergence, having it applied all the time is likely
damaging. When this value is less than 1.0, any
perturbed noise will be applied probabilistically.
Default: 0.25
--delete_unwanted_images
If set, will delete images that are not of a minimum
size to save on disk space for large training runs.
Default behaviour: Unset, remove images from bucket
only.
--delete_problematic_images
If set, any images that error out during load will be
removed from the underlying storage medium. This is
useful to prevent repeatedly attempting to cache bad
files on a cloud bucket.
--offset_noise Fine-tuning against a modified noise See:
https://www.crosslabs.org//blog/diffusion-with-offset-
noise for more information.
--lr_end LR_END A polynomial learning rate will end up at this value
after the specified number of warmup steps.