0
votes

Is there anyway to deal with RDDs with only a single element (this can sometimes happen for what I am doing)? When that's the case, reduce stops working as the operation requires 2 inputs.

I am working with key-value pairs such as:

(key1, 10),
(key2, 20),

And I want to aggregate their values, so the result should be:

30

But there are cases where the rdd only contain a single key-value pair, so reduce does not work here, example:

(key1, 10)

This will return nothing.

1

1 Answers

0
votes

If you do a .values() before doing reduce, it should work even if there is only 1 element in the RDD:

from operator import add

rdd = sc.parallelize([('key1', 10),])

rdd.values().reduce(add)
# 10