I have a large sparse matrix from scipy (300k x 100k with all binary values, mostly zeros). I would like to set the rows of this matrix to be an RDD and then do some computations on those rows - evaluate a function on each row, evaluate functions on pairs of rows, etc.
Key thing is that it's quite sparse and I don't want to explode the cluster - can I convert the rows to SparseVectors? Or perhaps convert the whole thing to SparseMatrix?
Can you give an example where you read in a sparse array, setup rows into an RDD, and compute something from the cartesian product of those rows?