0
votes

I'm learning spark I'd like to use an avro data file as avro is external to spark. I've downloaded the jar. But my problem is how to copy it into that specific place 'jars dir' into my container? enter image description here I've read relative post here but I do not understand.

I've see also this command below from spark main website but I think I need the jar file copied before running it.

./bin/spark-shell --packages org.apache.spark:spark-avro_2.XX:X.X.X ...

What I tried is

docker cp /Users/username/Downloads/spark-avro_2.11-2.4.5.jar docker-spark_master_1:/jars

but it's not working. thanks in advance

nb: I'm running spark 2.4 container with worker and master.

1
The jars directory is under /usr/spark-2.4.1. Your destination for cp should be docker-spark_master_1:/usr/spark-2.4.1/jars/. Still, the command you tried should have created a file called jars under /. That did not happen? - franklinsijo
Yes yes I see jars in root /.. - abdoulsn
spark_master_1 should be the name of master - abdoulsn
docker-spark_master_1 is the name of the container. The syntax is container_name:destination - franklinsijo
Will update it as the answer. - franklinsijo

1 Answers

1
votes

Quoting docker cp Documentation,

docker cp SRC_PATH CONTAINER:DEST_PATH

If SRC_PATH specifies a file and DEST_PATH does not exist then the file is saved to a file created at DEST_PATH

From the command you tried,

The destination path /jars does not exist in the container since the actual destination should have been /usr/spark-2.4.1/jars/. Thus the jar was copied to the container with the name jars under the root (/) directory.

Try this command instead to add the jar to spark jars,

docker cp /Users/username/Downloads/spark-avro_2.11-2.4.5.jar docker-spark_master_1:/usr/spark-2.4.1/jars/