Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lock acquisiton fails on download #2543

Open
AlpinDale opened this issue Sep 16, 2024 · 2 comments
Open

Lock acquisiton fails on download #2543

AlpinDale opened this issue Sep 16, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@AlpinDale
Copy link

AlpinDale commented Sep 16, 2024

Describe the bug

I've been trying to download NousResearch/Meta-Llama-3.1-8B-Instruct with and without hf-transfer, but it consistently hangs at the 10GB point (2 shards with hf-transfer, half of each without), with this message being repeated every few seconds:

still waiting to acquire lock on /home/austin/.cache/huggingface/hub/.locks/models--NousResearch--Meta-Llama-3.1-8B-Instruct/fc1cdddd6bfa91128d6e94ee73d0ce62bfcdb7af29e978ddcab30c66ae9ea7fa.lock

Reproduction

pip install -U huggingface-hub[cli] hf-transfer

HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download NousResearch/Meta-Llama-3.1-8B-Instruct --exclude *.pth

Logs

$ huggingface-cli download NousResearch/Meta-Llama-3.1-8B-Instruct --exclude *.pth
Downloading '.gitattributes' to '/home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/a6344aac8c09253b3b630fb776ae94478aa0275b.incomplete'
.gitattributes: 100%|███████████████████████████████████████████████████████████████████████████| 1.52k/1.52k [00:00<00:00, 12.4MB/s]
Download complete. Moving file to /home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/a6344aac8c09253b3b630fb776ae94478aa0275b
Downloading 'LICENSE' to '/home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/a7c3ca16cee30425ed6ad841a809590f2bcbf290.incomplete'
LICENSE: 100%|██████████████████████████████████████████████████████████████████████████████████| 7.63k/7.63k [00:00<00:00, 22.6MB/s]
Download complete. Moving file to /home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/a7c3ca16cee30425ed6ad841a809590f2bcbf290
Downloading 'README.md' to '/home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/71ce2f59177b48e3da2ac1b559393f4fcd9b3ea1.incomplete'
README.md: 100%|████████████████████████████████████████████████████████████████████████████████| 41.8k/41.8k [00:00<00:00, 63.5MB/s]
Download complete. Moving file to /home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/71ce2f59177b48e3da2ac1b559393f4fcd9b3ea1
Downloading 'USE_POLICY.md' to '/home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/81ebb55902285e8dd5804ccf423d17ffb2a622ee.incomplete'
USE_POLICY.md: 100%|████████████████████████████████████████████████████████████████████████████| 4.69k/4.69k [00:00<00:00, 12.7MB/s]
Download complete. Moving file to /home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/81ebb55902285e8dd5804ccf423d17ffb2a622ee
Downloading 'config.json' to '/home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/0bb6fd75b3ad2fe988565929f329945262c2814e.incomplete'
config.json: 100%|██████████████████████████████████████████████████████████████████████████████████| 855/855 [00:00<00:00, 3.32MB/s]
Download complete. Moving file to /home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/0bb6fd75b3ad2fe988565929f329945262c2814e
Downloading 'generation_config.json' to '/home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/cc7276afd599de091142c6ed3005faf8a74aa257.incomplete'
generation_config.json: 100%|████████████████████████████████████████████████████████████████████████| 184/184 [00:00<00:00, 735kB/s]
Download complete. Moving file to /home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/cc7276afd599de091142c6ed3005faf8a74aa257
Downloading 'model-00001-of-00004.safetensors' to '/home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/2b1879f356aed350030bb40eb45ad362c89d9891096f79a3ab323d3ba5607668.incomplete'
model-00001-of-00004.safetensors: 100%|█████████████████████████████████████████████████████████▉| 4.98G/4.98G [00:11<00:00, 443MB/s]
Download complete. Moving file to /home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/2b1879f356aed350030bb40eb45ad362c89d9891096f79a3ab323d3ba5607668
Downloading 'model-00002-of-00004.safetensors' to '/home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/09d433f650646834a83c580877bd60c6d1f88f7755305c12576b5c7058f9af15.incomplete'
model-00002-of-00004.safetensors: 100%|█████████████████████████████████████████████████████████▉| 5.00G/5.00G [00:08<00:00, 600MB/s]
Download complete. Moving file to /home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/09d433f650646834a83c580877bd60c6d1f88f7755305c12576b5c7058f9af15
still waiting to acquire lock on /home/austin/.cache/huggingface/hub/.locks/models--NousResearch--Meta-Llama-3.1-8B-Instruct/fc1cdddd6bfa91128d6e94ee73d0ce62bfcdb7af29e978ddcab30c66ae9ea7fa.lock
still waiting to acquire lock on /home/austin/.cache/huggingface/hub/.locks/models--NousResearch--Meta-Llama-3.1-8B-Instruct/fc1cdddd6bfa91128d6e94ee73d0ce62bfcdb7af29e978ddcab30c66ae9ea7fa.lock
still waiting to acquire lock on /home/austin/.cache/huggingface/hub/.locks/models--NousResearch--Meta-Llama-3.1-8B-Instruct/fc1cdddd6bfa91128d6e94ee73d0ce62bfcdb7af29e978ddcab30c66ae9ea7fa.lock
still waiting to acquire lock on /home/austin/.cache/huggingface/hub/.locks/models--NousResearch--Meta-Llama-3.1-8B-Instruct/fc1cdddd6bfa91128d6e94ee73d0ce62bfcdb7af29e978ddcab30c66ae9ea7fa.lock
still waiting to acquire lock on /home/austin/.cache/huggingface/hub/.locks/models--NousResearch--Meta-Llama-3.1-8B-Instruct/fc1cdddd6bfa91128d6e94ee73d0ce62bfcdb7af29e978ddcab30c66ae9ea7fa.lock
still waiting to acquire lock on /home/austin/.cache/huggingface/hub/.locks/models--NousResearch--Meta-Llama-3.1-8B-Instruct/fc1cdddd6bfa91128d6e94ee73d0ce62bfcdb7af29e978ddcab30c66ae9ea7fa.lock
still waiting to acquire lock on /home/austin/.cache/huggingface/hub/.locks/models--NousResearch--Meta-Llama-3.1-8B-Instruct/fc1cdddd6bfa91128d6e94ee73d0ce62bfcdb7af29e978ddcab30c66ae9ea7fa.lock
still waiting to acquire lock on /home/austin/.cache/huggingface/hub/.locks/models--NousResearch--Meta-Llama-3.1-8B-Instruct/fc1cdddd6bfa91128d6e94ee73d0ce62bfcdb7af29e978ddcab30c66ae9ea7fa.lock
^CTraceback (most recent call last):
  File "/home/austin/disk1/aphrodite-engine/conda/envs/aphrodite-runtime/bin/huggingface-cli", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/austin/disk1/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/huggingface_hub/commands/huggingface_cli.py", line 52, in main
    service.run()
  File "/home/austin/disk1/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/huggingface_hub/commands/download.py", line 146, in run
    print(self._download())  # Print path to downloaded files
          ^^^^^^^^^^^^^^^^
  File "/home/austin/disk1/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/huggingface_hub/commands/download.py", line 180, in _download
    return snapshot_download(
           ^^^^^^^^^^^^^^^^^^
  File "/home/austin/disk1/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/austin/disk1/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/huggingface_hub/_snapshot_download.py", line 297, in snapshot_download
    _inner_hf_hub_download(file)
  File "/home/austin/disk1/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/huggingface_hub/_snapshot_download.py", line 273, in _inner_hf_hub_download
    return hf_hub_download(
           ^^^^^^^^^^^^^^^^
  File "/home/austin/disk1/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/huggingface_hub/utils/_deprecation.py", line 101, in inner_f
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/home/austin/disk1/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/austin/disk1/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1240, in hf_hub_download
    return _hf_hub_download_to_cache_dir(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/austin/disk1/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1388, in _hf_hub_download_to_cache_dir
    with WeakFileLock(lock_path):
  File "/home/austin/disk1/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/contextlib.py", line 137, in __enter__
    return next(self.gen)
           ^^^^^^^^^^^^^^
  File "/home/austin/disk1/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/huggingface_hub/utils/_fixes.py", line 91, in WeakFileLock
    lock.acquire()
  File "/home/austin/disk1/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/filelock/_api.py", line 344, in acquire
    time.sleep(poll_interval)
KeyboardInterrupt

System info

- huggingface_hub version: 0.24.7
- Platform: Linux-5.15.0-119-generic-x86_64-with-glibc2.35
- Python version: 3.11.9
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Token path ?: /home/austin/.cache/huggingface/token
- Has saved token ?: True
- Who am I ?: alpindale
- Configured git credential helpers: store
- FastAI: N/A
- Tensorflow: N/A
- Torch: 2.4.0
- Jinja2: 3.1.4
- Graphviz: N/A
- keras: N/A
- Pydot: N/A
- Pillow: 10.4.0
- hf_transfer: 0.1.8
- gradio: N/A
- tensorboard: N/A
- numpy: 1.26.4
- pydantic: 2.8.2
- aiohttp: 3.10.5
- ENDPOINT: https://huggingface.co
- HF_HUB_CACHE: /home/austin/.cache/huggingface/hub
- HF_ASSETS_CACHE: /home/austin/.cache/huggingface/assets
- HF_TOKEN_PATH: /home/austin/.cache/huggingface/token
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False
- HF_HUB_ETAG_TIMEOUT: 10
- HF_HUB_DOWNLOAD_TIMEOUT: 10
@AlpinDale AlpinDale added the bug Something isn't working label Sep 16, 2024
@Wauplin
Copy link
Contributor

Wauplin commented Sep 16, 2024

Hi @AlpinDale, sorry for the inconvenience. What type of hard-drive is it? (quite classic or a special mounted drive?). Asking because filelock doesn't always work properly on some filesystems. Independently from that, you can try to kill all huggingface_hub/hf_transfer processes and then run rm -rf /home/austin/.cache/huggingface/hub/.locks to delete all current locks. This should fix your issues ( 🤞 ), though I can't explain why it happened in the first place.

@JakubCzarlinski
Copy link

JakubCzarlinski commented Nov 13, 2024

Same issue here. Tried to delete the .locks but it unfortunately didn't help. Instead, reducing the --max-workers to something like 2 worked.

EG:

huggingface-cli download stabilityai/stable-diffusion-3.5-medium --max-workers 2

This is without using hf_transfer, and for a different model. In my case this did not hinder performance, but I imagine that varies much on your network speed.

EDIT: Spoke too soon. Didn't solve however reduced the frequency at least.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants