I have a mapper method:
def mapper(value):
...
for key, value in some_list:
yield key, value
what I need is not really far from the ordinary wordcount example, actually. I already have working script, but only if the mapper method looks like that:
def mapper(value):
...
return key, value
This is how its call looks like:
sc.textFile(sys.argv[2], 1).map(mapper).reduceByKey(reducer).collect()
I spent 2 hours trying to write code that would support generators in mapper. But couldn't do that. I even agree to just returning a list:
def mapper(value):
...
result_list = []
for key, value in some_list:
result_list.append( key, value )
return result_list
Here: https://groups.google.com/forum/#!searchin/spark-users/flatmap$20multiple/spark-users/1WqVhRBaJsU/-D5QRbenlUgJ I found that I should use flatMap, but it didn't do the trick - my reducer then started to get inputs like (key1, value1, key2, value2, value3, ...) - but it should be [(key1, value1), (key2, value2, value3)...]. In other words, reducer started taking only single pieces, and don't know whether it's a value or a key, and if value - to which key it belongs.
So how to use mappers that return iterators or lists?
Thanks!