I have configured Spark Streaming to receive data from Kafka, following the Kafka Integration Guide.
I configure Spark Streaming's duration to 20 seconds, and I try to save every 20 seconds received messages to HDFS, using DStream method saveAsTextFile
I successfully run the application, and it successfully receives data from Kafka and saves every 20 seconds messages to HDFS. But I am confused about the output layout. It's cool every 20 seconds a directory with prefix specified by the parameter of saveAsTextFile is created, containing some output files with prefix "part-" such as "part-00001"
However, there is only one message content in each output file. It seems Kafka DStream save each message received to a single output file in HDFS. I am looking forward to save multiple messages to one output file.
BTW, I am using Spark Standalone deployment and having only one worker.