You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 20, 2023. It is now read-only.
If I deploy a pool with Standard_NC4as_T4_v3 without the gpu:nvidia_driver:source specification in pool.yaml, the pool succeeds but the NVIDIA drivers are not installed.
If I specify gpu:nvidia_driver:source, I get an error: local variable 'gpu_driver' referenced before assignment
The same pool.yaml works fine with Standard_NC6s_v3
Batch Shipyard Version
3.9.1
Steps to Reproduce
Try to deploy a pool with Standard_NC4as_T4_v3
Expected Results
Pool is deployed
Actual Results
Error is returned when gpu:nvidia_driver:source specification is provided in pool.yaml:
2021-09-21 09:02:21.573 INFO - uploading file /tmp/_MEIRpaARG/scripts/shipyard_docker_exec_task_runner.sh as 'shipyard_docker_exec_task_runner.sh'
Traceback (most recent call last):
File "shipyard.py", line 3136, in <module>
File "site-packages/click/core.py", line 764, in __call__
File "site-packages/click/core.py", line 717, in main
File "site-packages/click/core.py", line 1137, in invoke
File "site-packages/click/core.py", line 1137, in invoke
File "site-packages/click/core.py", line 956, in invoke
File "site-packages/click/core.py", line 555, in invoke
File "site-packages/click/decorators.py", line 64, in new_func
File "site-packages/click/core.py", line 555, in invoke
File "shipyard.py", line 1546, in pool_add
File "convoy/fleet.py", line 3451, in action_pool_add
File "convoy/fleet.py", line 1849, in _add_pool
File "convoy/fleet.py", line 1555, in _construct_pool_object
UnboundLocalError: local variable 'gpu_driver' referenced before assignment
[9269] Failed to execute script shipyard
I also tried with source: https://us.download.nvidia.com/tesla/460.73.01/NVIDIA-Linux-x86_64-460.73.01.run which deploys without issues other NC series (e.g., NC6s v3) and got the same error.
The text was updated successfully, but these errors were encountered:
- Move transitional SR-IOV RDMA instances to normal
- Add Standard_Ncas_T4_v3 and Standard_NdasrA100_v4 support
- Fix CentOS 7.8+ support along with LIS
- Resolves#370
- Move transitional SR-IOV RDMA instances to normal
- Add Standard_Ncas_T4_v3 and Standard_NdasrA100_v4 support
- Fix CentOS 7.8+ support along with LIS
- Resolves#370
- Move transitional SR-IOV RDMA instances to normal
- Add Standard_Ncas_T4_v3 and Standard_NdasrA100_v4 support
- Fix CentOS 7.8+ support along with LIS
- Resolves#370
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Problem Description
If I deploy a pool with Standard_NC4as_T4_v3 without the
gpu:nvidia_driver:source
specification in pool.yaml, the pool succeeds but the NVIDIA drivers are not installed.If I specify
gpu:nvidia_driver:source
, I get an error:local variable 'gpu_driver' referenced before assignment
The same pool.yaml works fine with Standard_NC6s_v3
Batch Shipyard Version
3.9.1
Steps to Reproduce
Try to deploy a pool with Standard_NC4as_T4_v3
Expected Results
Pool is deployed
Actual Results
Error is returned when
gpu:nvidia_driver:source specification
is provided in pool.yaml:Redacted Configuration
pool.yaml
config.yaml
Additional Logs
Additonal Comments
I also tried with
source: https://us.download.nvidia.com/tesla/460.73.01/NVIDIA-Linux-x86_64-460.73.01.run
which deploys without issues other NC series (e.g., NC6s v3) and got the same error.The text was updated successfully, but these errors were encountered: