When using spark streaming and the built in HDFS support, I have encountered the follwing inconvenience:
dStream.saveAsTextFiles produces many subdirectories in HDFS. rdd.saveAsTextFile also creates subdirectories for each set of parts.
I am looking for a method which puts all of the parts in the same path:
myHdfsPath/Prefix_time-part0XXX
instead of
myHdfsPath/Prefix_time/part0XXX
That way I can later iterate over these files more easily by scanning a single HDFS directory.