-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEAT] DDUF format #10037
base: main
Are you sure you want to change the base?
[FEAT] DDUF format #10037
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for starting this! Apart from the initial comments I left, I have a few higher level questions:
We're assuming that the archive will contain all the pipeline-related files in the diffusers
format. I am fine with that. This means we'll be shipping DDUF only at the pipeline-level for now, correct?
dduf_format: bool = False, | ||
dduf_filename: Optional[Union[str, os.PathLike]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned over Slack, why do we need two arguments here? How about just dduf_filename
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will just use that then !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that most, if not all, the components of the file name should be inferred. One of the reasons to use DDUF would be to provide confidence about the contents based on filename inspection, so we should try to make it work with something like an optional prefix, but build the rest automatically. So something like dduf_name="my-lora"
resulting in the library saving to "my-lora-stable-diffusion-3-medium-diffusers-fp16.dduf"
, or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok to use some kind of convention, but i don't think it should be an absolute requirement, ie. people can rename their files if they want to (but the metadata should be inside the file anyways)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that is the plan.
I just tried out Serializationfrom diffusers import FluxTransformer2DModel, DiffusionPipeline
from diffusers import BitsAndBytesConfig
from transformers import BitsAndBytesConfig as BnbConfig
from transformers import T5EncoderModel
from huggingface_hub import create_repo, upload_folder
import torch
repo_id = "DDUF/Flux.1-Dev-Tnf4-TE2NF4"
repo_id = create_repo(repo_id=repo_id, exist_ok=True).repo_id
dtype = torch.bfloat16
nf4_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=dtype,
)
transformer = FluxTransformer2DModel.from_pretrained(
"black-forest-labs/FLUX.1-dev",
subfolder="transformer",
quantization_config=nf4_config,
torch_dtype=dtype
)
nf4_config = BnbConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=dtype,
)
text_encoder_2 = T5EncoderModel.from_pretrained(
"black-forest-labs/FLUX.1-dev",
subfolder="text_encoder_2",
quantization_config=nf4_config,
torch_dtype=dtype
)
pipeline = DiffusionPipeline.from_pretrained(
"black-forest-labs/FLUX.1-dev",
transformer=transformer,
text_encoder_2=text_encoder_2,
torch_dtype=dtype
)
pipeline.enable_model_cpu_offload()
image = pipeline(
"a cat standing by the sea waiting for its companion to come and console it.",
guidance_scale=4.5,
num_inference_steps=50,
max_sequence_length=512,
generator=torch.manual_seed(0)
).images[0]
image.save("dduf_out.png")
folder = "flux.1-dduf"
pipeline.save_pretrained(folder, dduf_filename="Flux.1-Dev-Tnf4-TE2NF4.dduf")
upload_folder(repo_id=repo_id, folder_path=folder, create_pr=True) Loadingfrom diffusers import DiffusionPipeline
import torch
repo_id = "DDUF/Flux.1-Dev-Tnf4-TE2NF4"
pipeline = DiffusionPipeline.from_pretrained(
repo_id, dduf="Flux.1-Dev-Tnf4-TE2NF4.dduf", torch_dtype=torch.bfloat16
)
pipeline.enable_model_cpu_offload()
image = pipeline(
"a cat standing by the sea waiting for its companion to come and console it.",
guidance_scale=4.5,
num_inference_steps=50,
max_sequence_length=512,
generator=torch.manual_seed(0)
).images[0]
image.save("dduf_out.png") |
# remove tar archive to free memory | ||
os.remove(tar_file_path) | ||
# rename folder to match the name of the dduf archive | ||
os.rename(extract_to, tar_file_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that expanding the archive and removing it could be problematic.
- If I understand correctly, we would be removing the symlink to the blob, but not the blob itself. We won't be saving disk space.
- If we did remove the blob, I think it'd trigger another download next time we call
.from_pretrained
. - Removing the reference could maybe have consequences when checking for newer versions in the Hub (not sure).
When we change to using zips, I wonder if we could extract files progressively in memory during the load process, and keeping the big .dduf file as is. Having a disk snapshot is convenient, but I can see no easy solution to avoid duplicating space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I will check if it is possible to stream directly from the zip or at least extract progressively in memory as you said
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the link ! I'll have a look !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW on the Hub side we want to constraint the types of Zip a lot:
- no compression only (to make sure it's deduplicated well by XetHub)
- single disk (internal simplifications of Zip)
- no encryption etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To ensure restrictions are met, we'll provide tooling in Python to save/load DDUF files in a consistent way: huggingface/huggingface_hub#2692. This will be part of huggingface_hub
to that it can be easily reused by any library without having necessarily diffusers
as a dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW a bit orthogonal but do you have examples of quantized diffusers repos right now?
This is one example: https://huggingface.co/sayakpaul/FLUX.1-Fill-dev-nf4. Note that it doesn't include all the models needed to run Flux as it's not typical to quantize the VAE and the other CLIP-based text encoder. |
@SunMarc I don't think loading is working: from diffusers import DiffusionPipeline
repo_id = "DDUF/stable-diffusion-3-medium-diffusers-DDUF"
filename = "stable-diffusion-3-medium-diffusers.dduf"
pipeline = DiffusionPipeline.from_pretrained(repo_id, dduf=filename) Leads to: ErrorThe above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/fsx/sayak/diffusers/load_dduf.py", line 23, in <module>
pipeline = DiffusionPipeline.from_pretrained(repo_id, dduf=filename)
File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "/fsx/sayak/diffusers/src/diffusers/pipelines/pipeline_utils.py", line 776, in from_pretrained
cached_folder = cls.download(
File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "/fsx/sayak/diffusers/src/diffusers/pipelines/pipeline_utils.py", line 1379, in download
config_file = hf_hub_download(
File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 862, in hf_hub_download
return _hf_hub_download_to_cache_dir(
File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 925, in _hf_hub_download_to_cache_dir
(url_to_download, etag, commit_hash, expected_size, head_call_error) = _get_metadata_or_catch_error(
File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1376, in _get_metadata_or_catch_error
metadata = get_hf_file_metadata(
File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1296, in get_hf_file_metadata
r = _request_wrapper(
File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 277, in _request_wrapper
response = _request_wrapper(
File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 301, in _request_wrapper
hf_raise_for_status(response)
File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 417, in hf_raise_for_status
raise _format(EntryNotFoundError, message, response) from e
huggingface_hub.errors.EntryNotFoundError: 404 Client Error. (Request ID: Root=1-674931ef-1fdc6f274e6f862c31db623e;2056ffe1-9bd0-4667-806f-273827e95101)
Entry Not Found for url: https://huggingface.co/DDUF/stable-diffusion-3-medium-diffusers-DDUF/resolve/main/model_index.json. which is expected. Update: I have pushed my updates in |
Co-authored-by: Sayak Paul <[email protected]>
Co-authored-by: Sayak Paul <[email protected]>
Merged ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're close to merge for this one. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks for adding!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for working on this! @SunMarc!
I left a question about shared dduf checkpoint, looking great otherwise and hope to see this merged soon:)
@@ -1453,10 +1488,14 @@ def download(cls, pretrained_model_name, **kwargs) -> Union[str, os.PathLike]: | |||
return snapshot_folder | |||
|
|||
user_agent = {"pipeline_class": cls.__name__} | |||
if custom_pipeline is not None and not custom_pipeline.endswith(".py"): | |||
if not dduf_file and custom_pipeline is not None and not custom_pipeline.endswith(".py"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use ddif_file
with custom_pipeline
? if not yet, let's just throw an error instead, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @sayakpaul as you added this part
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we should throw an error when custom_pipeline
and dduf_file
both are passed.
@@ -1313,7 +1342,14 @@ def download(cls, pretrained_model_name, **kwargs) -> Union[str, os.PathLike]: | |||
local_files_only = True | |||
model_info_call_error = e # save error to reraise it if model is not cached locally | |||
|
|||
if not local_files_only: | |||
if ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is probably easier if we make different download_dduf
/load_dduf_config
/_get_dduf_model_file
etc because it is so different and seems would just a few lines of code
but I don't feel strongly about it, adding to existing functions also works for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @sayakpaul
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah decoupling these into different methods like suggested is a nice idea.
@@ -808,7 +811,8 @@ def from_pretrained(cls, pretrained_model_name_or_path: Optional[Union[str, os.P | |||
|
|||
model = load_flax_checkpoint_in_pytorch_model(model, model_file) | |||
else: | |||
if is_sharded: | |||
# in the case it is sharded, we have already the index | |||
if is_sharded and not dduf_entries: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we support dduf_entries for sharded checkpoints?
theis_sharded
logic (on line 786) depends on whether the index_file
exists in the filesystem, which is just never going to be True
for dduf because all the files are contained in the dduf_entires
- it is unclear to me whether this is intended or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dduf_entries
do not support sharded
checkpoint indeed. Thanks for noticing this. Maybe it is better to refactor the loading as we discussed a long time ago sayak and do the same as transformers where we just pass the state_dict
and the config ? WDYT @sayakpaul @yiyixuxu ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this PR do?
This PR adds support to DDUF format. More docs about this format here.
Requires transformers PR : https://github.com/huggingface/transformers/pull/35217/files (merged)
Requires huggingface_hub PR: huggingface/huggingface_hub#2692 (released)
Example
Load from the hub
Save to DDUF format
TODO list: