If I create two rdds like these:
a = sc.parallelize([[1 for j in range(3)] for i in xrange(10**9)])
b = sc.parallelize([[1 for j in xrange(10**9)] for i in range(3)])
When you think about it partitioning first one is intuitive, billion rows are partitioned around workers. But for the second one there are 3 rows and for each row there are billion item.
My question is: For the second line, if I have 2 workers does one row goes to one worker, and the other two rows goes to the other worker?