I'm going through an existing Spark code, as part of my learning process, and I came across the following code:
enPages.map(lambda x: x.split(" "))\
.map(lambda x: (x[2], int(x[3])))\
.reduceByKey(lambda x, y: x + y, 40)\
.collect()
I have a basic understood of how reduceByKey and lambda work, but in the above code, I did not get the utility of '40' in the reduceByKey/lambda function above. Any clarification would be appreciated.
Thanks, Sasi.
PS: I tried running the code by removing the '40' and compared the output, only to find that it is just the order of the key/value pairs that has changed, but for a given key the count of the value is the final output is the same.