1
votes

I'm submitting a Spark job to a remote spark cluster on yarn and including a file in the spark-submit --file I want to read the submitted file as a dataframe. But I'm confused about how to go about this without having to put the file in HDFS:

spark-submit \
--class com.Employee \
--master yarn \
--files /User/employee.csv \
--jars SomeJar.jar

spark: SparkSession = // create the Spark Session
val df = spark.read.csv("/User/employee.csv")
2
Possible duplicate of how can dataframereader read http?zero323
@DevEx.. was my answer useful?vikrant rana

2 Answers

0
votes
spark.sparkContext.addFile("file:///your local file path ")

Add file using addFile so that it can be available at your worker nodes. Since you want to read local file in cluster mode.

You may need to do a slight change according to scala and the spark version your are using.

-1
votes

employee.csv is in the work directory of executor, just reading it as follows:

val df = spark.read.csv("employee.csv")