Spark read csv file submitted from --files

Question

I'm submitting a Spark job to a remote spark cluster on yarn and including a file in the spark-submit --file I want to read the submitted file as a dataframe. But I'm confused about how to go about this without having to put the file in HDFS:

spark-submit \
--class com.Employee \
--master yarn \
--files /User/employee.csv \
--jars SomeJar.jar

spark: SparkSession = // create the Spark Session
val df = spark.read.csv("/User/employee.csv")

vikrant rana vikrant rana · Accepted Answer · 2018-12-13T15:40:14

spark.sparkContext.addFile("file:///your local file path ")

Add file using addFile so that it can be available at your worker nodes. Since you want to read local file in cluster mode.

You may need to do a slight change according to scala and the spark version your are using.

Spark read csv file submitted from --files

2 Answers