You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 20, 2023. It is now read-only.
I am trying to run a simple OpenMPI test code using mpi4py and this docker image (aalati/mpi_ex_mit). The container includes a python script with mpi4py that checks the nodes communications and a shell script that passes the python script to mpiexec.
When I submit the job to the pool using shipyard I get the following error.
Error response from daemon: Cannot kill container: simjob-aalati-mpi_ex_mit: No such container: simjob-aalati-mpi_ex_mit
Error: No such container: simjob-aalati-mpi_ex_mit
Warning: Permanently added '[10.0.0.5]:23' (ECDSA) to the list of known hosts.
Warning: Permanently added '[10.0.0.6]:23' (ECDSA) to the list of known hosts.
**********************************************************
Open MPI does not support recursive calls of mpiexec
**********************************************************
I am not sure if the problem comes from the pool or job configuration or from the construction of the container (even if this is less likely as it works as expected when I run it locally). I have attached below the Dockerfile and the jobs configuration in case useful.
I would appreciate any advice on the issue. Thank you very much for your help.
Batch Shipyard Version
I am using the version on Azure CloudShell for now.
Problem Description
I am trying to run a simple OpenMPI test code using mpi4py and this docker image (aalati/mpi_ex_mit). The container includes a python script with mpi4py that checks the nodes communications and a shell script that passes the python script to
mpiexec
.When I submit the job to the pool using shipyard I get the following error.
I am not sure if the problem comes from the pool or job configuration or from the construction of the container (even if this is less likely as it works as expected when I run it locally). I have attached below the Dockerfile and the
jobs
configuration in case useful.I would appreciate any advice on the issue. Thank you very much for your help.
Batch Shipyard Version
I am using the version on Azure CloudShell for now.
Redacted Configuration
jobs.yaml
Dockerfile
ssh_config
The text was updated successfully, but these errors were encountered: