Releases: leejet/stable-diffusion.cpp
Releases Β· leejet/stable-diffusion.cpp
master-c837c5d
style: format code
master-64d231f
feat: add flux support (#356) * add flux support * avoid build failures in non-CUDA environments * fix schnell support * add k quants support * add support for applying lora to quantized tensors * add inplace conversion support for f8_e4m3 (#359) in the same way it is done for bf16 like how bf16 converts losslessly to fp32, f8_e4m3 converts losslessly to fp16 * add xlabs flux comfy converted lora support * update docs --------- Co-authored-by: Erik Scholz <[email protected]>
master-697d000
feat: add SYCL Backend Support for Intel GPUs (#330) * update ggml and add SYCL CMake option Signed-off-by: zhentaoyu <[email protected]> * hacky CMakeLists.txt for updating ggml in cpu backend Signed-off-by: zhentaoyu <[email protected]> * rebase and clean code Signed-off-by: zhentaoyu <[email protected]> * add sycl in README Signed-off-by: zhentaoyu <[email protected]> * rebase ggml commit Signed-off-by: zhentaoyu <[email protected]> * refine README Signed-off-by: zhentaoyu <[email protected]> * update ggml for supporting sycl tsembd op Signed-off-by: zhentaoyu <[email protected]> --------- Signed-off-by: zhentaoyu <[email protected]>
master-3d854f7
sync: update ggml submodule url
master-73c2176
feat: add sd3 support (#298)
master-4a6e36e
sync: update ggml
master-9c51d87
chore: fix cuda CI (#286)
master-e1384de
perf: make crc32 100x faster on x86-64 (#278) This change makes checkpoints load significantly faster by optimizing pkzip's cyclic redundancy check. This code was developed by Intel and Google and Mozilla. See Chromium's zlib codebase for further details.
master-8142803
chore: update artifact actions (#267)
master-1d2af5c
fix: set n_dims of tensor storage to 1 when it's 0