Skip to content

Releases: leejet/stable-diffusion.cpp

master-65fa646

23 Nov 04:42
65fa646
Compare
Choose a tag to compare
feat: add sd3.5 medium and skip layer guidance support (#451)

* mmdit-x

* add support for sd3.5 medium

* add skip layer guidance support (mmdit only)

* ignore slg if slg_scale is zero (optimization)

* init out_skip once

* slg support for flux (expermiental)

* warn if version doesn't support slg

* refactor slg cli args

* set default slg_scale to 0 (oops)

* format code

---------

Co-authored-by: leejet <[email protected]>

master-2b1bc06

23 Nov 05:15
2b1bc06
Compare
Choose a tag to compare
feat: add PhotoMaker Version 2 support (#358)

* first attempt at updating to photomaker v2

* continue adding photomaker v2 modules

* finishing the last few pieces for photomaker v2; id_embeds need to be done by a manual step and pass as an input file

* added a name converter for Photomaker V2; build ok

* more debugging underway

* failing at cuda mat_mul

* updated chunk_half to be more efficient; redo feedforward

* fixed a bug: carefully using ggml_view_4d to get chunks of a tensor; strides need to be recalculated or set properly; still failing at soft_max cuda op

* redo weight calculation and weight*v

* fixed a bug now Photomaker V2 kinds of working

* add python script for face detection (Photomaker V2 needs)

* updated readme for photomaker

* fixed a bug causing PMV1 crashing; both V1 and V2 work

* fixed clean_input_ids for PMV2

* fixed a double counting bug in tokenize_with_trigger_token

* updated photomaker readme

* removed some commented code

* improved reconstructing class word free prompt

* changed reading id_embed to raw binary using existing load tensor function; this is more efficient than using model load and also makes it easier to work with sd server

* minor clean up

---------

Co-authored-by: bssrdf <[email protected]>

master-1c168d9

23 Nov 06:04
1c168d9
Compare
Choose a tag to compare
fix: repair flash attention support (#386)

* repair flash attention in _ext
this does not fix the currently broken fa behind the define, which is only used by VAE

Co-authored-by: FSSRepo <[email protected]>

* make flash attention in the diffusion model a runtime flag
no support for sd3 or video

* remove old flash attention option and switch vae over to attn_ext

* update docs

* format code

---------

Co-authored-by: FSSRepo <[email protected]>
Co-authored-by: leejet <[email protected]>

master-ac54e00

24 Oct 15:21
ac54e00
Compare
Choose a tag to compare
feat: add sd3.5 support (#445)

master-e410aeb

02 Sep 15:56
e410aeb
Compare
Choose a tag to compare
sync: update ggml to fix large image generation with SYCL backend (#380)

* turn off fast-math on host in SYCL backend

Signed-off-by: zhentaoyu <[email protected]>

* update ggml for sync some sycl ops

Signed-off-by: zhentaoyu <[email protected]>

* update sycl readme and ggml

Signed-off-by: zhentaoyu <[email protected]>

---------

Signed-off-by: zhentaoyu <[email protected]>

master-14206fd

02 Sep 15:53
14206fd
Compare
Choose a tag to compare
fix: fix clip tokenizer (#383)

master-f4c937c

27 Aug 18:20
f4c937c
Compare
Choose a tag to compare
fix: add some missing cli args to usage (#363)

master-e71ddce

27 Aug 18:35
e71ddce
Compare
Choose a tag to compare
fix: improve VAE tiling (#372)

* fix and improve: VAE tiling
- properly handle the upper left corner interpolating both x and y
- refactor out lerp
- use smootherstep to preserve more detail and spend less area blending

* actually fix vae tile merging

Co-authored-by: stduhpf <[email protected]>

* remove the now unused lerp function

---------

Co-authored-by: stduhpf <[email protected]>

master-dc0882c

27 Aug 17:43
dc0882c
Compare
Choose a tag to compare
feat: add exponential scheduler (#346)

* feat: added exponential scheduler

* updated README

* improved exponential formatting

---------

Co-authored-by: leejet <[email protected]>

master-d00c948

27 Aug 17:21
d00c948
Compare
Choose a tag to compare
feat: add ipndm and ipndm_v samplers (#344)