I've just started playing with apache spark and trying to get the kafka wordcount to work in python. I've decided to use python as its a language I'll be able to use for other big data tech and also DataBricks are offering their courses through spark.
My question: I'm running the basic wordcount example from here: https://github.com/apache/spark/blob/master/examples/src/main/python/streaming/kafka_wordcount.py It seems to kick off and connect to the kafka logs but I can't see it actually produce a word count. I then added the below lines to write to a text file and it just produces a bunch of empty text file. It is connecting to the kafka topic and there is data in the topic, how can I see what its actually doing with the data if anything? Could it be a timing thing? Cheers.
Code for processing kafka data
counts = lines.flatMap(lambda line: line.split("|")) \
.map(lambda word: (word, 1)) \
.reduceByKey(lambda a, b: a+b) \
.saveAsTextFiles("sparkfiles")
Data in Kafka topic
16|16|Mr|Joe|T|Bloggs