-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault with threaded Distributed #572
Labels
bug
Something isn't working
Comments
jarbus
changed the title
Segmentation fault with Distributed and Flux models on NVIDIA GPU
Segmentation fault with multiprocessing
Dec 12, 2024
The following, simpler code produces the same issue, this time without Flux or CUDA: using Distributed
addprocs(8; exeflags="--threads=8")
@everywhere begin
using CondaPkg
using PythonCall
function initialize_car_racing_env(_)
gym = pyimport("gymnasium")
# do some multithreaded work
Threads.@threads for i in 1:100
# do some work
if rand() < 0
println("some work")
end
end
env = gym.make("CarRacing-v3")
obs, info = env.reset()
env.close()
return 1
end
end
for generation in 1:10_000
if generation % 100 == 0
println("Generation: $generation")
end
pmap(initialize_car_racing_env, 1:12)
end |
jarbus
changed the title
Segmentation fault with multiprocessing
Segmentation fault with threaded Distributed
Dec 12, 2024
Even simpler code: using Distributed
addprocs(1; exeflags="--threads=16")
@everywhere begin
using PythonCall
function initialize_car_racing_env(_)
pyimport("gymnasium").make("CarRacing-v3").close()
return 1
end
end
pmap(initialize_car_racing_env, 1:2^14) |
Notably, this doesn't occur for the |
This appears to be related to the Box2D python package segfaulting when an environment that uses it is launched with multiple threads |
Another example which produces the same issue: using Distributed
addprocs(32)
@everywhere using PythonCall, CUDA
@everywhere begin
function initialize_car_racing_env(_)
pyimport("gymnasium").make("CarRacing-v3").close()
x = randn(2, 2)
cu(x)
cu(x)
return 1
end
end
pmap(initialize_car_racing_env, 1:2^12) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Affects: PythonCall
Describe the bug
This is a very quirky bug. I'm getting a segmentation fault when using python's
gymnasium
package with multiple processes while a Flux model is loaded on the GPU.Setup:
Run (crash is non-deterministic, try running a few times on a machine with an NVIDIA GPU):
Stack trace:
Your system
Please provide detailed information about your system:
5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Additional context
I'm researching embodied AI and trying to use Julia's distributed capabilities to do so while still evaluating on python environments.
The text was updated successfully, but these errors were encountered: