I'm a beginner in Spark. I've a scenario where there are multiple source of data at different point of time for an analysis. Can I have 2 spark jobs to use a single HDFS/S3 storage at the same time? One job will write latest data to S3/HDFS and other will read that along with input data from another source for analysis.
0
votes
2 Answers
0
votes
0
votes
In order to use both file systems, you need to include the protocol for the files.
e.g. spark.read.path("s3a://bucket/file") and/or spark.write.path("hdfs:///tmp/data")
However, you can use S3 directly in place of HDFS via setting fs.defaultFS