Releases: leejet/stable-diffusion.cpp
Releases · leejet/stable-diffusion.cpp
master-65fa646
feat: add sd3.5 medium and skip layer guidance support (#451) * mmdit-x * add support for sd3.5 medium * add skip layer guidance support (mmdit only) * ignore slg if slg_scale is zero (optimization) * init out_skip once * slg support for flux (expermiental) * warn if version doesn't support slg * refactor slg cli args * set default slg_scale to 0 (oops) * format code --------- Co-authored-by: leejet <[email protected]>
master-2b1bc06
feat: add PhotoMaker Version 2 support (#358) * first attempt at updating to photomaker v2 * continue adding photomaker v2 modules * finishing the last few pieces for photomaker v2; id_embeds need to be done by a manual step and pass as an input file * added a name converter for Photomaker V2; build ok * more debugging underway * failing at cuda mat_mul * updated chunk_half to be more efficient; redo feedforward * fixed a bug: carefully using ggml_view_4d to get chunks of a tensor; strides need to be recalculated or set properly; still failing at soft_max cuda op * redo weight calculation and weight*v * fixed a bug now Photomaker V2 kinds of working * add python script for face detection (Photomaker V2 needs) * updated readme for photomaker * fixed a bug causing PMV1 crashing; both V1 and V2 work * fixed clean_input_ids for PMV2 * fixed a double counting bug in tokenize_with_trigger_token * updated photomaker readme * removed some commented code * improved reconstructing class word free prompt * changed reading id_embed to raw binary using existing load tensor function; this is more efficient than using model load and also makes it easier to work with sd server * minor clean up --------- Co-authored-by: bssrdf <[email protected]>
master-1c168d9
fix: repair flash attention support (#386) * repair flash attention in _ext this does not fix the currently broken fa behind the define, which is only used by VAE Co-authored-by: FSSRepo <[email protected]> * make flash attention in the diffusion model a runtime flag no support for sd3 or video * remove old flash attention option and switch vae over to attn_ext * update docs * format code --------- Co-authored-by: FSSRepo <[email protected]> Co-authored-by: leejet <[email protected]>
master-ac54e00
feat: add sd3.5 support (#445)
master-e410aeb
sync: update ggml to fix large image generation with SYCL backend (#380) * turn off fast-math on host in SYCL backend Signed-off-by: zhentaoyu <[email protected]> * update ggml for sync some sycl ops Signed-off-by: zhentaoyu <[email protected]> * update sycl readme and ggml Signed-off-by: zhentaoyu <[email protected]> --------- Signed-off-by: zhentaoyu <[email protected]>
master-14206fd
fix: fix clip tokenizer (#383)
master-f4c937c
fix: add some missing cli args to usage (#363)
master-e71ddce
fix: improve VAE tiling (#372) * fix and improve: VAE tiling - properly handle the upper left corner interpolating both x and y - refactor out lerp - use smootherstep to preserve more detail and spend less area blending * actually fix vae tile merging Co-authored-by: stduhpf <[email protected]> * remove the now unused lerp function --------- Co-authored-by: stduhpf <[email protected]>
master-dc0882c
feat: add exponential scheduler (#346) * feat: added exponential scheduler * updated README * improved exponential formatting --------- Co-authored-by: leejet <[email protected]>
master-d00c948
feat: add ipndm and ipndm_v samplers (#344)