-
Notifications
You must be signed in to change notification settings - Fork 938
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Repeated timeouts in GitHub Actions fetching wheel for large packages #1912
Comments
Perhaps I just need to use
|
Thanks for the feedback, I've opened issues for your requests |
Thank you @zanieb ! I don't know the value of having this issue open, but I'll leave it to you to close if desired. |
In #1921 my co-worker noted that this might be a bug in the way we're specifying the timeout so I'll recategorize this one and leave it open. |
Looking at the actions runs, all the passing actions take ~30s, while the failing ones error after 5min, which is our default timeout, so this looks like a network failure (in either github actions or rust) |
I'm not seeing any timeouts anymore with the two most recent versions (https://github.com/konstin/vws-python-mock/actions). Could you check if this now solved? |
I have not seen this issue since posting. Thank you for looking into this. |
I'll close it for now, please feel free to reopen should it reoccur |
@konstin I do not have permissions to re-open this issue. I can create a new one, but it is probably easier if you re-open this. This failure has reoccurred: |
I'm seeing very similar error message for non pytorch package that's also pretty large. It's ~400 MB wheel and consistently gives me,
Package is company internal one though, but I think only notable thing is very large size (it vendors spark/java stuff). edit: Pytorch weirdly installs fine for me pretty fast. |
torch
I have changed the title of this to not reference As another example, https://github.com/VWS-Python/vws-python-mock/actions/runs/8262236134 has 7 failures in one run. |
It can happen on Read the Docs as well, not only GHA https://beta.readthedocs.org/projects/kedro-datasets/builds/23790543/ |
Spotted it locally today inside a local Docker image running under QEMU
|
Reverts c59f0ca (#13) Too many CI test timeouts from installing torch/nvidia packages with uv: astral-sh/uv#1912
Reverts c59f0ca (#13) Too many CI test timeouts from installing torch/nvidia packages with uv: astral-sh/uv#1912
I encountered the problem when I used either uv or pip to download large wheels (for pip, the issue is pypa/pip#4796 and pypa/pip#11153), so I think the root cause is the network. However, I am wondering if uv can be smarter to retry automatically, like something in pypa/pip#11180. |
Worth trying 0.1.35, which includes #3144 |
It seems likely that this is resolved by #3144 |
that would be a great feature. we have our dev environments behind TLS inspection and some packages often run into a timeout due too slow inspection. we can reproduce this with a browser and the download gets stuck until a timeout. in the browser we can just click resume and the browser reconnects snd downloads the remaining part. with uv we don't have a retry with resume. so it starts from scratch and gets stuck again. +1 for retry with resume |
Going to close for now, but we can re-open if this comes up again post-changing the timeout semantics. |
In the last few days since switching to
uv
, I have seen errors that I have not seen before withpip
.I see:
I see this on the CI for
vws-python-mock
, which requires installing 150 packages:I do this in parallel across many jobs on GitHub Actions, mostly on
ubuntu-latest
.This happened with
torch 2.2.0
before the recent release oftorch 2.2.1
.It has not happened with any other dependencies.
The wheels for
torch
are pretty huge: https://pypi.org/project/torch/#files.uv
is always at the latest version as I runcurl -LsSf https://astral.sh/uv/install.sh | sh
. In the most recent example, this isuv 0.1.9
.Failures:
The text was updated successfully, but these errors were encountered: