I have data in the form of RDD[List[Double], List[Double]], for example:
sampleData =
(
((1.1, 1.2, 1.3), (1.1, 1.5, 1.2)),
((3.0, 3.3, 3.3), (3.1, 3.2, 3.6))
)
I would like to call Statistics.corr(a, b) where a is from the first List[Double] and b is from the second List[Double]
The result I would like is 2 correlation values from the corr() function for (1.1, 1.2, 1.3), (1.1, 1.5, 1.2) and (3.0, 3.3, 3.3), (3.1, 3.2, 3.6)
My attempted solution is:
Statistics.corr(sampleData.flatMap(_._1), sampleData.flatMap(_._2))
This is giving me a single correlation for (1.1, 1.2, 1.3, 3.0, 3.3, 3.3), (1.1, 1.5, 1.2, 3.1, 3.2, 3.6), which is not what I want