Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stable-diffusion: implement ESRGAN upscaler + Metal Backend #104

Merged
merged 31 commits into from
Dec 28, 2023
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
2a74c4a
add esrgan upscaler
FSSRepo Dec 5, 2023
f140532
add sd_tiling
FSSRepo Dec 8, 2023
b5ade20
ggml: adapt to new backend
FSSRepo Dec 8, 2023
d99e650
Merge remote-tracking branch 'upstream/master'
FSSRepo Dec 9, 2023
f83b742
fix some conflicts
FSSRepo Dec 9, 2023
10ac491
Merge branch 'leejet:master' into master
FSSRepo Dec 10, 2023
136474d
sd_tiling support overlapping + vae tiling
FSSRepo Dec 10, 2023
1ad22f2
Merge branch 'master' of https://github.com/FSSRepo/stable-diffusion.cpp
FSSRepo Dec 10, 2023
2bcca30
prepare to use metal as backend
FSSRepo Dec 10, 2023
05101e4
support metal backend
FSSRepo Dec 10, 2023
e641b8c
fix submodule
FSSRepo Dec 10, 2023
e1f5a1c
fix metal ggml-submodule
FSSRepo Dec 10, 2023
5759b53
fix possible error
FSSRepo Dec 10, 2023
4d46747
fix metal compilation errors
FSSRepo Dec 11, 2023
dfe6abb
restore free memory param
FSSRepo Dec 11, 2023
e8d4cb0
fix metal backendissue
FSSRepo Dec 12, 2023
8dc966e
cuda: fast softmax cmake option
FSSRepo Dec 13, 2023
60ce78f
Merge branch 'leejet:master' into master
FSSRepo Dec 13, 2023
8cb12d2
update readme
FSSRepo Dec 13, 2023
d6c8fc0
Merge branch 'master' of https://github.com/FSSRepo/stable-diffusion.cpp
FSSRepo Dec 13, 2023
f97ff96
improve upscale log info
FSSRepo Dec 13, 2023
6bdbd25
standardize naming conventions
leejet Dec 14, 2023
56e6474
format code
leejet Dec 14, 2023
62dd027
simplify esrgan code
leejet Dec 14, 2023
ccdec9d
indeterministic results fast softmax
FSSRepo Dec 15, 2023
1e3797a
fix clip_skip
leejet Dec 18, 2023
6dce7ea
fix cuda sync buffers
FSSRepo Dec 18, 2023
94ccbb8
Merge branch 'master' into pr75.head
leejet Dec 21, 2023
fd5726c
Merge branch 'master' into pr75.head
leejet Dec 23, 2023
9ea2bcd
synchronize after get tensor from backend
leejet Dec 23, 2023
421e39b
use ggml_backend_tensor_get_async and sync for cuda backend
leejet Dec 23, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ endif()
#option(SD_BUILD_TESTS "sd: build tests" ${SD_STANDALONE})
option(SD_BUILD_EXAMPLES "sd: build examples" ${SD_STANDALONE})
option(SD_CUBLAS "sd: cuda backend" OFF)
option(SD_METAL "sd: metal backend" OFF)
option(SD_FLASH_ATTN "sd: use flash attention for x4 less memory usage" OFF)
option(BUILD_SHARED_LIBS "sd: build shared libs" OFF)
#option(SD_BUILD_SERVER "sd: build server example" ON)
Expand All @@ -35,6 +36,12 @@ if(SD_CUBLAS)
add_definitions(-DSD_USE_CUBLAS)
endif()

if(SD_METAL)
message("Use Metal as backend stable-diffusion")
set(GGML_METAL ON)
add_definitions(-DSD_USE_METAL)
endif()

if(SD_FLASH_ATTN)
message("Use Flash Attention for memory optimization")
add_definitions(-DSD_USE_FLASH_ATTENTION)
Expand Down
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ Inference of [Stable Diffusion](https://github.com/CompVis/stable-diffusion) in
- LoRA support, same as [stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#lora)
- Latent Consistency Models support (LCM/LCM-LoRA)
- Faster and memory efficient latent decoding with [TAESD](https://github.com/madebyollin/taesd)
- Upscale images generated with [ESRGAN](https://github.com/xinntao/Real-ESRGAN)
- Sampling method
- `Euler A`
- `Euler`
Expand All @@ -52,7 +53,8 @@ Inference of [Stable Diffusion](https://github.com/CompVis/stable-diffusion) in
- Implement Winograd Convolution 2D for 3x3 kernel filtering
- [ ] Continuing to reduce memory usage (quantizing the weights of ggml_conv_2d)
- [ ] Implement BPE Tokenizer
- [ ] Implement [Real-ESRGAN](https://github.com/xinntao/Real-ESRGAN/tree/master) upscaler
- [ ] Implement Textual Inversion (embeddings)
- [ ] Implement Inpainting support
- [ ] k-quants support

## Usage
Expand Down Expand Up @@ -135,6 +137,7 @@ arguments:
-m, --model [MODEL] path to model
--vae [VAE] path to vae
--taesd [TAESD_PATH] path to taesd. Using Tiny AutoEncoder for fast decoding (low quality)
-um, --upscale-model [ESRGAN_PATH] path to esrgan model. Upscale images after generate, just RealESRGAN_x4plus_anime_6B supported by now.
--type [TYPE] weight type (f32, f16, q4_0, q4_1, q5_0, q5_1, q8_0)
If not specified, the default is the type of the weight file.
--lora-model-dir [DIR] lora model directory
Expand Down
37 changes: 35 additions & 2 deletions examples/cli/main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ struct SDParams {
std::string model_path;
std::string vae_path;
std::string taesd_path;
std::string esrgan_path;
ggml_type wtype = GGML_TYPE_COUNT;
std::string lora_model_dir;
std::string output_path = "output.png";
Expand All @@ -67,6 +68,7 @@ struct SDParams {
std::string prompt;
std::string negative_prompt;
float cfg_scale = 7.0f;
int clip_skip_layers = 0;
int width = 512;
int height = 512;
int batch_count = 1;
Expand All @@ -78,6 +80,7 @@ struct SDParams {
RNGType rng_type = CUDA_RNG;
int64_t seed = 42;
bool verbose = false;
bool vae_tiling = false;
};

void print_params(SDParams params) {
Expand Down Expand Up @@ -115,6 +118,7 @@ void print_usage(int argc, const char* argv[]) {
printf(" -m, --model [MODEL] path to model\n");
printf(" --vae [VAE] path to vae\n");
printf(" --taesd [TAESD_PATH] path to taesd. Using Tiny AutoEncoder for fast decoding (low quality)\n");
printf(" -um, --upscale-model [ESRGAN_PATH] path to esrgan model. Upscale images after generate, just RealESRGAN_x4plus_anime_6B supported by now.\n");
printf(" --type [TYPE] weight type (f32, f16, q4_0, q4_1, q5_0, q5_1, q8_0)\n");
printf(" If not specified, the default is the type of the weight file.\n");
printf(" --lora-model-dir [DIR] lora model directory\n");
Expand All @@ -134,6 +138,8 @@ void print_usage(int argc, const char* argv[]) {
printf(" -s SEED, --seed SEED RNG seed (default: 42, use random seed for < 0)\n");
printf(" -b, --batch-count COUNT number of images to generate.\n");
printf(" --schedule {discrete, karras} Denoiser sigma schedule (default: discrete)\n");
printf(" -cs, --clip-skip N number of layers to skip of clip model (default: 0)\n");
printf(" -vt, --vae-tiling process vae in tiles to reduce memory usage\n");
printf(" -v, --verbose print extra info\n");
}

Expand Down Expand Up @@ -185,6 +191,12 @@ void parse_args(int argc, const char** argv, SDParams& params) {
break;
}
params.taesd_path = argv[i];
} else if (arg == "--upscale-model" || arg == "-um") {
if (++i >= argc) {
invalid_arg = true;
break;
}
params.esrgan_path = argv[i];
} else if (arg == "--type") {
if (++i >= argc) {
invalid_arg = true;
Expand Down Expand Up @@ -270,6 +282,14 @@ void parse_args(int argc, const char** argv, SDParams& params) {
break;
}
params.sample_steps = std::stoi(argv[i]);
} else if (arg == "-cs" || arg == "--clip-skip") {
if (++i >= argc) {
invalid_arg = true;
break;
}
params.clip_skip_layers = std::stoi(argv[i]);
} else if (arg == "-vt" || arg == "--vae-tiling") {
params.vae_tiling = true;
} else if (arg == "-b" || arg == "--batch-count") {
if (++i >= argc) {
invalid_arg = true;
Expand Down Expand Up @@ -458,9 +478,9 @@ int main(int argc, const char* argv[]) {
}
}

StableDiffusion sd(params.n_threads, vae_decode_only, params.taesd_path, true, params.lora_model_dir, params.rng_type);
StableDiffusion sd(params.n_threads, vae_decode_only, params.taesd_path, params.esrgan_path, true, params.vae_tiling, params.lora_model_dir, params.rng_type);

if (!sd.load_from_file(params.model_path, params.vae_path, params.wtype, params.schedule)) {
if (!sd.load_from_file(params.model_path, params.vae_path, params.wtype, params.schedule, params.clip_skip_layers)) {
return 1;
}

Expand Down Expand Up @@ -488,6 +508,19 @@ int main(int argc, const char* argv[]) {
params.seed);
}

if(params.esrgan_path.size() > 0) {
// TODO: support more ESRGAN models, making it easier to set up ESRGAN models.
/* hardcoded scale factor because just RealESRGAN_x4plus_anime_6B is compatible
See also: https://github.com/xinntao/Real-ESRGAN/blob/master/inference_realesrgan.py

To avoid this, the upscaler needs to be separated from the stable diffusion pipeline.
However, a considerable amount of work would be required for this. It might be better
to opt for a complete project refactoring that facilitates the easier assignment of parameters.
*/
params.width *= 4;
params.height *= 4;
}

if (results.size() == 0 || results.size() != params.batch_count) {
LOG_ERROR("generate failed");
return 1;
Expand Down
14 changes: 10 additions & 4 deletions model.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@
#include "ggml/ggml-backend.h"
#include "ggml/ggml.h"

#ifdef SD_USE_METAL
#include "ggml-metal.h"
#endif

#define ST_HEADER_SIZE_LEN 8

uint64_t read_u64(uint8_t* buffer) {
Expand Down Expand Up @@ -1208,7 +1212,7 @@ bool ModelLoader::load_vocab(on_new_token_cb_t on_new_token_cb) {
return true;
}

bool ModelLoader::load_tensors(on_new_tensor_cb_t on_new_tensor_cb) {
bool ModelLoader::load_tensors(on_new_tensor_cb_t on_new_tensor_cb, ggml_backend_t backend) {
bool success = true;
for (size_t file_index = 0; file_index < file_paths_.size(); file_index++) {
std::string file_path = file_paths_[file_index];
Expand Down Expand Up @@ -1296,11 +1300,13 @@ bool ModelLoader::load_tensors(on_new_tensor_cb_t on_new_tensor_cb) {
continue;
}

ggml_backend_t backend = ggml_get_backend(dst_tensor);

size_t nbytes_to_read = tensor_storage.nbytes_to_read();

if (backend == NULL || ggml_backend_is_cpu(backend)) {
if (dst_tensor->buffer == NULL || ggml_backend_is_cpu(backend)
#ifdef SD_USE_METAL
|| ggml_backend_is_metal(model.backend)
FSSRepo marked this conversation as resolved.
Show resolved Hide resolved
#endif
) {
// for the CPU and Metal backend, we can copy directly into the tensor
if (tensor_storage.type == dst_tensor->type) {
GGML_ASSERT(ggml_nbytes(dst_tensor) == tensor_storage.nbytes());
Expand Down
2 changes: 1 addition & 1 deletion model.h
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ class ModelLoader {
SDVersion get_sd_version();
ggml_type get_sd_wtype();
bool load_vocab(on_new_token_cb_t on_new_token_cb);
bool load_tensors(on_new_tensor_cb_t on_new_tensor_cb);
bool load_tensors(on_new_tensor_cb_t on_new_tensor_cb, ggml_backend_t backend);
int64_t cal_mem_size(ggml_backend_t backend);
~ModelLoader() = default;
};
Expand Down
Loading
Loading