We have a java application built using Spark Streaming 1.4, which polls a dir for new files every 20 seconds and there is another script which populates new (no copy) files every 5 seconds.
Issue is - spark log shows that it has picked up new files
2015-07-07 05:13:00,390 - [INFO ] org.apache.spark.streaming.dstream.FileInputDStream:59 New files at time 1436226180000 ms:
file:/home/mata/Downloads/in/365379649921050.txt
file:/home/mata/Downloads/in/365364610737285.txt
file:/home/mata/Downloads/in/365374642289893.txt
file:/home/mata/Downloads/in/365369640106263.txt
2015-07-07 05:13:00,918 - [INFO ] org.apache.spark.storage.MemoryStore:59 ensureFreeSpace(231040) called with curMem=0, maxMem=280248975
But the RDD processing (joins, aggregations) are not happening. I added log statements in the processing part but it displays those statements only once at the start-up. Has anyone experienced this issue?
// Create the context with a 20 second batch size
SparkConf sparkConf = new SparkConf()
.setAppName("Data - Streaming App");
JavaStreamingContext ssc = new JavaStreamingContext(sparkConf,
Durations.seconds(Long.valueOf(args[3]).longValue()));