Does Spark have any analog of Scala scan
operation to work on RDD collections?
(for details please see Reduce, fold or scan (Left/Right)?)
For example:
val abc = List("A", "B", "C")
def add(res: String, x: String) = {
println(s"op: $res + $x = ${res + x}")
res + x
So to get:
// op: z + A = zA // same operations as foldLeft above...
// op: zA + B = zAB
// op: zAB + C = zABC
// res: List[String] = List(z, zA, zAB, zABC) // maps intermediate results
Any other means to achieve the same result?
What is "Spark" way to solve, for example, the following problem:
Compute elements of the vector as (in pseudocode):
x(i) = SomeFun(for k from 0 to i-1)(y(k))
Should I collect
RDD for this? No other way?
Update 2
Ok, I understand the general problem. Yet maybe you could advise me on the particular case I have to deal with.
I have a list of ints as input RDD and I have to build an outptut RDD, where the following should hold:
1) input.length == output.length // output list is of the same length as input
2) output(i) = sum( range (0..i), input(i)) / q^i // i-th element of output list equals sum of input elements from 0 to i divided by i-th power of some constant q
In fact I need a combination of map
and fold
function to solve this.
Another idea is to write a recursive fold
on diminishing tails of the input list. But this is super inefficient and AFAIK Spark does not have tail
or init
function for RDD.
How would you solve this problem in Sparck?