-
Notifications
You must be signed in to change notification settings - Fork 70
spark rdma error #11
Comments
Can you please provide command that you use to submit spark terasort? Do you use yarn or standalone cluster deployment mode? Can you please check logs for that blockManager Id? |
Hi, I used the standalone mode. |
Ok can you please check for errors in spark log directory: |
Hi, here are zip file in attachment which contains error message. The dataset size I used is 1g. spark-root-org.apache.spark.deploy.worker.Worker-1-rdma21.zip |
Sorry logs for executors are in |
Hi, |
In
Could you please also try to generate bigger data. You are running 15 executors for 1 Gb of input data (<100Mb per executor). Or try to run with smaller number of executors to make sure everything is working |
Hi, |
Do you use Infiniband or Roce? Does your monitoring system configured to monitor RDMA traffic. You can check how to monitor RDMA traffic here: https://community.mellanox.com/docs/DOC-2416 |
Hi, What's your command to run SparkRDMA terasort please? I don't know whether there is some difference in the commands to run the SparkRDMA terasort program. |
Here's how i run SparkRDMA Ehigg's terasort version:
Run terasort:
You can find how to run Hibench terasort version here, but the approach is the same. Basically you need to set |
Hi, |
Disni is just a wrapper over verbs api. If you setup PFC for rdma traffic to go under 5th queue, it'll go there. We've updated our wiki documentation, you can check Advanced forms of flowcontrol. |
我们的问题是:总是找不到libdisni, |
@RummySugar You need to install libdisni so in each server or upload with |
Hi,
I was trying to run SparkRDMA Terasort code. The common Spark Terasort can finish successfully, however, there exist errors for Spark RDMA Terasort code. Here is the errors as below:
I used Spark 2.1.0
The text was updated successfully, but these errors were encountered: