1
votes

I'm using Spark 1.5.1 with the standalone cluster manager. Spark's default spark-assembly-1.5.1-hadoop2.6.0.jar includes Avro 1.7.7. I want to use my custom Avro library for all my Spark jobs, let's call it Avro 1.7.8. This works perfectly in dev mode (master=local[*]). However, when I submit my app to the cluster in client mode, the executors still use Avro 1.7.7 library.

URL url = getClass().getClassLoader().getResource(GenericData.class.getName().replace('.','/')+".class");

When I print this, my executor's log shows :

/opt/spark/lib/spark-assembly-1.5.1-hadoop2.6.0.jar/org/apache/avro/generic/GenericData.class

Here is a part of my spark-env.sh on the worker node :

export SPARK_WORKER_OPTS="-Dspark.executor.extraClassPath=/home/ansible/avro-1.7.8.jar -Dspark.executor.userClassPathFirst=true

Here is my worker process on the worker node (ps aux | grep worker) :

spark 955 1.8 1.9 4161448 243600 ? Sl 13:29 0:09 /usr/java/jdk1.7.0_79/jre/bin/java -cp /home/ansible/avro-1.7.8.jar:/etc/spark-worker/:/opt/spark-1.5.1-bin-hadoop2.6/lib/spark-assembly-1.5.1-hadoop2.6.0.jar:/opt/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/opt/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/opt/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar -Dspark.executor.extraClassPath=/home/ansible/avro-1.7.8.jar -Dspark.executor.userClassPathFirst=true -Xms512m -Xmx512m -XX:MaxPermSize=256m org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://spark-a-01:7077

Obviously, I put this jar : /home/ansible/avro-1.7.8.jar in all my worker nodes.

Does anyone knows how to force the executor to use my jar instead of the spark assembly's one ?

1

1 Answers

0
votes

Try using the --packages option to spark-submit:

spark-submit --packages org.apache.avro:avro:1.7.8 ....

Something like that. If you're not using spark-submit, use it -- this is exactly what it is for.