Lock acquisiton fails on download #2543

AlpinDale · 2024-09-16T07:22:28Z

Describe the bug

I've been trying to download NousResearch/Meta-Llama-3.1-8B-Instruct with and without hf-transfer, but it consistently hangs at the 10GB point (2 shards with hf-transfer, half of each without), with this message being repeated every few seconds:

still waiting to acquire lock on /home/austin/.cache/huggingface/hub/.locks/models--NousResearch--Meta-Llama-3.1-8B-Instruct/fc1cdddd6bfa91128d6e94ee73d0ce62bfcdb7af29e978ddcab30c66ae9ea7fa.lock

Reproduction

pip install -U huggingface-hub[cli] hf-transfer

HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download NousResearch/Meta-Llama-3.1-8B-Instruct --exclude *.pth

Logs

$ huggingface-cli download NousResearch/Meta-Llama-3.1-8B-Instruct --exclude *.pth
Downloading '.gitattributes' to '/home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/a6344aac8c09253b3b630fb776ae94478aa0275b.incomplete'
.gitattributes: 100%|███████████████████████████████████████████████████████████████████████████| 1.52k/1.52k [00:00<00:00, 12.4MB/s]
Download complete. Moving file to /home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/a6344aac8c09253b3b630fb776ae94478aa0275b
Downloading 'LICENSE' to '/home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/a7c3ca16cee30425ed6ad841a809590f2bcbf290.incomplete'
LICENSE: 100%|██████████████████████████████████████████████████████████████████████████████████| 7.63k/7.63k [00:00<00:00, 22.6MB/s]
Download complete. Moving file to /home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/a7c3ca16cee30425ed6ad841a809590f2bcbf290
Downloading 'README.md' to '/home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/71ce2f59177b48e3da2ac1b559393f4fcd9b3ea1.incomplete'
README.md: 100%|████████████████████████████████████████████████████████████████████████████████| 41.8k/41.8k [00:00<00:00, 63.5MB/s]
Download complete. Moving file to /home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/71ce2f59177b48e3da2ac1b559393f4fcd9b3ea1
Downloading 'USE_POLICY.md' to '/home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/81ebb55902285e8dd5804ccf423d17ffb2a622ee.incomplete'
USE_POLICY.md: 100%|████████████████████████████████████████████████████████████████████████████| 4.69k/4.69k [00:00<00:00, 12.7MB/s]
Download complete. Moving file to /home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/81ebb55902285e8dd5804ccf423d17ffb2a622ee
Downloading 'config.json' to '/home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/0bb6fd75b3ad2fe988565929f329945262c2814e.incomplete'
config.json: 100%|██████████████████████████████████████████████████████████████████████████████████| 855/855 [00:00<00:00, 3.32MB/s]
Download complete. Moving file to /home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/0bb6fd75b3ad2fe988565929f329945262c2814e
Downloading 'generation_config.json' to '/home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/cc7276afd599de091142c6ed3005faf8a74aa257.incomplete'
generation_config.json: 100%|████████████████████████████████████████████████████████████████████████| 184/184 [00:00<00:00, 735kB/s]
Download complete. Moving file to /home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/cc7276afd599de091142c6ed3005faf8a74aa257
Downloading 'model-00001-of-00004.safetensors' to '/home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/2b1879f356aed350030bb40eb45ad362c89d9891096f79a3ab323d3ba5607668.incomplete'
model-00001-of-00004.safetensors: 100%|█████████████████████████████████████████████████████████▉| 4.98G/4.98G [00:11<00:00, 443MB/s]
Download complete. Moving file to /home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/2b1879f356aed350030bb40eb45ad362c89d9891096f79a3ab323d3ba5607668
Downloading 'model-00002-of-00004.safetensors' to '/home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/09d433f650646834a83c580877bd60c6d1f88f7755305c12576b5c7058f9af15.incomplete'
model-00002-of-00004.safetensors: 100%|█████████████████████████████████████████████████████████▉| 5.00G/5.00G [00:08<00:00, 600MB/s]
Download complete. Moving file to /home/austin/.cache/huggingface/hub/models--NousResearch--Meta-Llama-3.1-8B-Instruct/blobs/09d433f650646834a83c580877bd60c6d1f88f7755305c12576b5c7058f9af15
still waiting to acquire lock on /home/austin/.cache/huggingface/hub/.locks/models--NousResearch--Meta-Llama-3.1-8B-Instruct/fc1cdddd6bfa91128d6e94ee73d0ce62bfcdb7af29e978ddcab30c66ae9ea7fa.lock
still waiting to acquire lock on /home/austin/.cache/huggingface/hub/.locks/models--NousResearch--Meta-Llama-3.1-8B-Instruct/fc1cdddd6bfa91128d6e94ee73d0ce62bfcdb7af29e978ddcab30c66ae9ea7fa.lock
still waiting to acquire lock on /home/austin/.cache/huggingface/hub/.locks/models--NousResearch--Meta-Llama-3.1-8B-Instruct/fc1cdddd6bfa91128d6e94ee73d0ce62bfcdb7af29e978ddcab30c66ae9ea7fa.lock
still waiting to acquire lock on /home/austin/.cache/huggingface/hub/.locks/models--NousResearch--Meta-Llama-3.1-8B-Instruct/fc1cdddd6bfa91128d6e94ee73d0ce62bfcdb7af29e978ddcab30c66ae9ea7fa.lock
still waiting to acquire lock on /home/austin/.cache/huggingface/hub/.locks/models--NousResearch--Meta-Llama-3.1-8B-Instruct/fc1cdddd6bfa91128d6e94ee73d0ce62bfcdb7af29e978ddcab30c66ae9ea7fa.lock
still waiting to acquire lock on /home/austin/.cache/huggingface/hub/.locks/models--NousResearch--Meta-Llama-3.1-8B-Instruct/fc1cdddd6bfa91128d6e94ee73d0ce62bfcdb7af29e978ddcab30c66ae9ea7fa.lock
still waiting to acquire lock on /home/austin/.cache/huggingface/hub/.locks/models--NousResearch--Meta-Llama-3.1-8B-Instruct/fc1cdddd6bfa91128d6e94ee73d0ce62bfcdb7af29e978ddcab30c66ae9ea7fa.lock
still waiting to acquire lock on /home/austin/.cache/huggingface/hub/.locks/models--NousResearch--Meta-Llama-3.1-8B-Instruct/fc1cdddd6bfa91128d6e94ee73d0ce62bfcdb7af29e978ddcab30c66ae9ea7fa.lock
^CTraceback (most recent call last):
  File "/home/austin/disk1/aphrodite-engine/conda/envs/aphrodite-runtime/bin/huggingface-cli", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/austin/disk1/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/huggingface_hub/commands/huggingface_cli.py", line 52, in main
    service.run()
  File "/home/austin/disk1/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/huggingface_hub/commands/download.py", line 146, in run
    print(self._download())  # Print path to downloaded files
          ^^^^^^^^^^^^^^^^
  File "/home/austin/disk1/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/huggingface_hub/commands/download.py", line 180, in _download
    return snapshot_download(
           ^^^^^^^^^^^^^^^^^^
  File "/home/austin/disk1/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/austin/disk1/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/huggingface_hub/_snapshot_download.py", line 297, in snapshot_download
    _inner_hf_hub_download(file)
  File "/home/austin/disk1/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/huggingface_hub/_snapshot_download.py", line 273, in _inner_hf_hub_download
    return hf_hub_download(
           ^^^^^^^^^^^^^^^^
  File "/home/austin/disk1/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/huggingface_hub/utils/_deprecation.py", line 101, in inner_f
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/home/austin/disk1/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/austin/disk1/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1240, in hf_hub_download
    return _hf_hub_download_to_cache_dir(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/austin/disk1/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1388, in _hf_hub_download_to_cache_dir
    with WeakFileLock(lock_path):
  File "/home/austin/disk1/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/contextlib.py", line 137, in __enter__
    return next(self.gen)
           ^^^^^^^^^^^^^^
  File "/home/austin/disk1/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/huggingface_hub/utils/_fixes.py", line 91, in WeakFileLock
    lock.acquire()
  File "/home/austin/disk1/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/filelock/_api.py", line 344, in acquire
    time.sleep(poll_interval)
KeyboardInterrupt

System info

- huggingface_hub version: 0.24.7
- Platform: Linux-5.15.0-119-generic-x86_64-with-glibc2.35
- Python version: 3.11.9
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Token path ?: /home/austin/.cache/huggingface/token
- Has saved token ?: True
- Who am I ?: alpindale
- Configured git credential helpers: store
- FastAI: N/A
- Tensorflow: N/A
- Torch: 2.4.0
- Jinja2: 3.1.4
- Graphviz: N/A
- keras: N/A
- Pydot: N/A
- Pillow: 10.4.0
- hf_transfer: 0.1.8
- gradio: N/A
- tensorboard: N/A
- numpy: 1.26.4
- pydantic: 2.8.2
- aiohttp: 3.10.5
- ENDPOINT: https://huggingface.co
- HF_HUB_CACHE: /home/austin/.cache/huggingface/hub
- HF_ASSETS_CACHE: /home/austin/.cache/huggingface/assets
- HF_TOKEN_PATH: /home/austin/.cache/huggingface/token
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False
- HF_HUB_ETAG_TIMEOUT: 10
- HF_HUB_DOWNLOAD_TIMEOUT: 10

The text was updated successfully, but these errors were encountered:

Wauplin · 2024-09-16T10:30:59Z

Hi @AlpinDale, sorry for the inconvenience. What type of hard-drive is it? (quite classic or a special mounted drive?). Asking because filelock doesn't always work properly on some filesystems. Independently from that, you can try to kill all huggingface_hub/hf_transfer processes and then run rm -rf /home/austin/.cache/huggingface/hub/.locks to delete all current locks. This should fix your issues ( 🤞 ), though I can't explain why it happened in the first place.

JakubCzarlinski · 2024-11-13T01:27:19Z

Same issue here. Tried to delete the .locks but it unfortunately didn't help. Instead, reducing the --max-workers to something like 2 worked.

EG:

huggingface-cli download stabilityai/stable-diffusion-3.5-medium --max-workers 2

This is without using hf_transfer, and for a different model. In my case this did not hinder performance, but I imagine that varies much on your network speed.

EDIT: Spoke too soon. Didn't solve however reduced the frequency at least.

AlpinDale added the bug Something isn't working label Sep 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lock acquisiton fails on download #2543

Lock acquisiton fails on download #2543

AlpinDale commented Sep 16, 2024 •

edited

Loading

Wauplin commented Sep 16, 2024 •

edited

Loading

JakubCzarlinski commented Nov 13, 2024 •

edited

Loading

Lock acquisiton fails on download #2543

Lock acquisiton fails on download #2543

Comments

AlpinDale commented Sep 16, 2024 • edited Loading

Describe the bug

Reproduction

Logs

System info

Wauplin commented Sep 16, 2024 • edited Loading

JakubCzarlinski commented Nov 13, 2024 • edited Loading

AlpinDale commented Sep 16, 2024 •

edited

Loading

Wauplin commented Sep 16, 2024 •

edited

Loading

JakubCzarlinski commented Nov 13, 2024 •

edited

Loading