You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am running SwitchML allreduce_benchmarks on a cluster of nodes with a mix of MLX5 NICs and some with Intel 82599 ES 10G NICs thus I'm using DPDK as the communication backend. I need to share the NIC on each host with other traffic so I'm virtualizing it by creating a virtual function of the PCI device in order to use the original device for general purpose traffic and the virtual device to run the SwitchML app. However, when I try to run SwitchML with the virtual device, I'm getting the following error:
Submitting 5 warmup jobs.
EAL: Detected 20 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: No available hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: PCI device 0000:01:10.1 on NUMA socket 0
EAL: probe driver: 8086:10ed net_ixgbe_vf
EAL: using IOMMU type 1 (Type 1)
E1116 19:08:30.807562 74629 dpdk_master_thread_utils.inc:277] Flow isolated mode failed: 1 Function not implemented
F1116 19:08:31.361418 74629 dpdk_master_thread_utils.inc:154] Flow rule can't be added: 1Function not implemented
*** Check failure stack trace: ***
@ 0x7f2291e280cd google::LogMessage::Fail()
@ 0x7f2291e29f33 google::LogMessage::SendToLog()
@ 0x7f2291e27c28 google::LogMessage::Flush()
@ 0x7f2291e2a999 google::LogMessageFatal::~LogMessageFatal()
@ 0x564504203fb2 switchml::InsertFlowRule()
@ 0x5645042048ca switchml::InitPort()
@ 0x564504205ae6 switchml::DpdkMasterThread::operator()()
@ 0x7f2291ae34c0 (unknown)
@ 0x7f22915766db start_thread
@ 0x7f229015761f clone
which seems to be caused by struct rte_flow_error error; LOG_IF(FATAL, rte_flow_validate(port_id, &attr, pattern, action, &error) != 0) << "Flow rule can't be added: " << error.type << (error.message ? error.message : "(no stated reason)"); in InsertFlowRule function. Any ideas on why this is happening and whether I can overcome this? Much appreciate it.
Thank you.
The text was updated successfully, but these errors were encountered:
Hi,
I am running SwitchML
allreduce_benchmark
s on a cluster of nodes with a mix of MLX5 NICs and some with Intel 82599 ES 10G NICs thus I'm using DPDK as the communication backend. I need to share the NIC on each host with other traffic so I'm virtualizing it by creating a virtual function of the PCI device in order to use the original device for general purpose traffic and the virtual device to run the SwitchML app. However, when I try to run SwitchML with the virtual device, I'm getting the following error:which seems to be caused by
struct rte_flow_error error; LOG_IF(FATAL, rte_flow_validate(port_id, &attr, pattern, action, &error) != 0) << "Flow rule can't be added: " << error.type << (error.message ? error.message : "(no stated reason)");
inInsertFlowRule
function. Any ideas on why this is happening and whether I can overcome this? Much appreciate it.Thank you.
The text was updated successfully, but these errors were encountered: