I am trying to implement Local Outlier Factor on Spark. So I have a set of points that I read from a file and then for each point find the N nearest neighbors. Each point has an index given to it using zipWithIndex() command
So Now I have two RDDs Firstly
RDD[(Index:Long, Array[(NeighborIndex:Long, Distance:Double)])]
Where Long represents its index, and the Array consist of its N nearest neighbors with the Long representing the Index position of these neighbors and Double Representing their Distance from the given point
Second
RDD[(Index:Long,LocalReachabilityDensity:Double)]
Here, Long again represents the Index of a given point, and Double represents its Local Reachability density
What I want, is an RDD, which contains all the points, and an array of their N closest neighbors and their Local Reachability density
RDD[(Index:Long, Array[(NeighborIndex:Long,LocalReachabilityDensity:Double)])]
So basically here, Long would represent the index of a point, and the array would be of its N closest neighbors, with their index values and Local Reachability density.
According to my understanding, I need to run a map on the first RDD, and then join the values in its array with the second RDD that contain the Local Reachability densities, to get Local Reachability density for all the given indexes of its N neighbors. But I am not sure how to achieve this. If any one can help me out that would be great