-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci: tests on Windows can sporadically fail #465
Comments
It feels to me like maybe the Windows platform, for whatever reasons, is simply less stable. Sometimes when GitHub Actions tries to launch an app, we'll get a
This translates to NTSTATUS code This is sporadic so I could maybe have GitHub Actions retry a command if it returns this exit code, but I am not sure what I am working-around here. Windows? GitHub Actions Windows platform? Some issue with the binaries I am launching? I'll probably never know. |
It is odd that the socket timeout exception seems to only occur on Windows. I have seen it when talking to:
|
A bit of a long shot, but maybe related?: Specifically this might help? |
Might help with clj-commons#465
I'll report on any future Windows |
While working on clj-http-lite I've noticed that process-launch->available-connection on Windows can sometimes be extremely slow. On ubuntu ~5s is normal, but on Windows ~10s and then sometimes a whopping 90s. |
As of this writing, we are still getting many sporadic failures on GitHub Actions Windows. Currently our default Or maybe just letting the user bind it to whatever is good enough. |
Note that a socket timeout exception can mean that the WebDriver process is hung/failing somehow. In these cases, it is valuable to see the WebDriver logs. By default, we do not capture WebDriver logs, they get sent to /dev/null. For our tests, it might be interesting to capture WebDriver logs but dump them only for failing tests. It is probably interesting to enable trace/debug-level logging, although a con of doing so is that this could potentially impact WebDriver behaviour. I'm probably willing to take that risk. If we go down this road, we also don't want to pollute a system with a bunch of WebDriver log files. So some strategy to either not do that, or easily clean up would be necessary. |
As of 2024-05-16 Windows is still very flaky. It is normal to have to rerun failing Windows jobs to get a successful CI run. |
Currently
It is not unusual for jobs on Windows to fail on GitHub Actions.
A re-run of any failed jobs usually succeeds.
Impact
It is a bit annoying to have to rerun these sporadic failed jobs.
Typical Symptom
The typical symptom is a
SocketTimeoutException
exception, for example:Thoughts
This is different from #464. The default socket timeout is 60 seconds.
I've not done much verification to try to reproduce this one on my Windows VM yet.
My wild stab guess is that the WebDriver process has maybe crashed, but I dunno yet.
Maybe we are asking the WebDriver to create a session before it is ready to do so.
We could try this idea from #464:
The text was updated successfully, but these errors were encountered: