Releases: leejet/stable-diffusion.cpp
Releases · leejet/stable-diffusion.cpp
master-9578fdc
chore: remove rocm5.5 build temporarily
master-b5f4932
refactor: add some sd vesion helper functions
master-9b1d90b
fix: improve clip text_projection support (#397)
master-8f94efa
feat: add support for loading F8_E5M2 weights (#460)
master-8c7719f
fix: typo in clip-g encoder arg (#472)
master-6ea8122
feat: add flux 1 lite 8B (freepik) support (#474) * Flux Lite (Freepik) support * format code --------- Co-authored-by: leejet <[email protected]>
master-65fa646
feat: add sd3.5 medium and skip layer guidance support (#451) * mmdit-x * add support for sd3.5 medium * add skip layer guidance support (mmdit only) * ignore slg if slg_scale is zero (optimization) * init out_skip once * slg support for flux (expermiental) * warn if version doesn't support slg * refactor slg cli args * set default slg_scale to 0 (oops) * format code --------- Co-authored-by: leejet <[email protected]>
master-2b1bc06
feat: add PhotoMaker Version 2 support (#358) * first attempt at updating to photomaker v2 * continue adding photomaker v2 modules * finishing the last few pieces for photomaker v2; id_embeds need to be done by a manual step and pass as an input file * added a name converter for Photomaker V2; build ok * more debugging underway * failing at cuda mat_mul * updated chunk_half to be more efficient; redo feedforward * fixed a bug: carefully using ggml_view_4d to get chunks of a tensor; strides need to be recalculated or set properly; still failing at soft_max cuda op * redo weight calculation and weight*v * fixed a bug now Photomaker V2 kinds of working * add python script for face detection (Photomaker V2 needs) * updated readme for photomaker * fixed a bug causing PMV1 crashing; both V1 and V2 work * fixed clean_input_ids for PMV2 * fixed a double counting bug in tokenize_with_trigger_token * updated photomaker readme * removed some commented code * improved reconstructing class word free prompt * changed reading id_embed to raw binary using existing load tensor function; this is more efficient than using model load and also makes it easier to work with sd server * minor clean up --------- Co-authored-by: bssrdf <[email protected]>
master-1c168d9
fix: repair flash attention support (#386) * repair flash attention in _ext this does not fix the currently broken fa behind the define, which is only used by VAE Co-authored-by: FSSRepo <[email protected]> * make flash attention in the diffusion model a runtime flag no support for sd3 or video * remove old flash attention option and switch vae over to attn_ext * update docs * format code --------- Co-authored-by: FSSRepo <[email protected]> Co-authored-by: leejet <[email protected]>
master-ac54e00
feat: add sd3.5 support (#445)