-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] FileLock dependency incompatible with filesystem #329
Comments
Hi, can you give details on your environment/os/packages versions/etc? |
Environment is Ubuntu 18.04, Python 3.7.5, nlp==0.3.0, filelock=3.0.12. The external volume is Amazon FSx for Lustre, and it by default creates files with limited permissions. My working theory is that FileLock creates a lockfile that isn't writable, and thus there's no way to acquire it by removing the .lock file. But Python is able to create new files and write to them outside of the FileLock package. When I attempt to use FileLock within a Docker container by writing to echo "hello world" >> hello.txt
ls -l
-rw-rw-r-- 1 ubuntu ubuntu 10 Jun 30 19:52 hello.txt |
Looks like the I added the |
Awesome, thanks a lot for sharing your fix! |
I'm wondering if this can be revisited. In some managed environments the same person using HF cannot change the file-system mount flags, (and the organization may be unwilling to change these flags due to other concerns) but can ensure that there won't be concurrent writes, for example because HF is offline and the models/datasets were downloaded earlier. The real fix would be to FileLock itself, which does not seem very active and seems to not deal with failed system flock calls , which would be one way to fix this, as they mention in the issue below also raised by @jarednielsen |
I am one of those users. Is there a work around for this? |
The machines I use have a shared FS which has the filelock problem as well as a local one that does not. Using some env vars (HF_HOME, which controls both models and datasets, and HF_DATASETS_OFFLINE) for both transformers and datasets library one can influence where these downloads happen, and whether the locks get taken. I think some of the relevant documentation is here https://huggingface.co/docs/transformers/installation#cache-setup. I do end up using different settings when I download the models and when I use them, and have to rsync the models to the local file system using a separate script. |
Thanks @orm011 . These filesystems are such a pain. I'll dig around, looks like setting |
Note I |
I am using a shared cluster with a lustre system that I can't change. I am unable to download or load datsets onto the filesystem because of file lock. @thomwolf can this issue be reopened? |
Hi, I am having this issue as well. Has there been a solution for this? Thanks! |
I'm downloading a dataset successfully with
load_dataset("wikitext", "wikitext-2-raw-v1")
But when I attempt to cache it on an external volume, it hangs indefinitely:
load_dataset("wikitext", "wikitext-2-raw-v1", cache_dir="/fsx") # /fsx is an external volume mount
The filesystem when hanging looks like this:
It appears that on this filesystem, the FileLock object is forever stuck in its "acquire" stage. I have verified that the issue lies specifically with the
filelock
dependency:Has anyone else run into this issue? I'd raise it directly on the FileLock repo, but that project appears abandoned with the last update over a year ago. Or if there's a solution that would remove the FileLock dependency from the project, I would appreciate that.
The text was updated successfully, but these errors were encountered: