I am running Spark 1.6.1 with Python 2.7 on Windows 7. The root scratch dir: /tmp/hive on HDFS is writable and my current permissions are: rwxrwxrwx (using winutils tools).
I want to stream files from a directory. According to the doc, the function textFileStream(directory):
Create an input stream that monitors a Hadoop-compatible file system for new files and reads them as text files. Files must be wrriten to the monitored directory by “moving” them from another location within the same file system. File names starting with . are ignored.
When I launch Spark Streaming command:
lines = ssc.textFileStream(r"C:/tmp/hive/")
counts = lines.flatMap(lambda line: line.split(" "))\
.map(lambda x: (x, 1))\
.reduceByKey(lambda a, b: a+b)
counts.pprint()
ssc.start()
and then create the files to stream in my directory, nothing happens.
I also tried this:
lines = ssc.textFileStream("/tmp/hive/")
and
lines = ssc.textFileStream("hdfs://tmp/hive/")
which is HDFS path related, but nothing happens again.
Do I do something wrong?