I am fairly new to spark-scala so please don't mind if this is a beginner question.
I have a directory test which contains two files, input1.txt and input2.txt. Now, lets say i create a RDD called inputRDD using
val inputRDD = sc.wholeTextFiles("/home/hduser/test")
which includes both the files into the pair RDD (inputRDD).
based on my understanding, inputRDD contains file name as the key and contents as the value something like this
(input1.txt,contents of input1.txt)
(input2.txt,contents of input2.txt)
Now, lets say I have to perform a join on both the files this way(which are in the same RDD) based on the first column.
contents of input1.txt ---------------------- 1 a 1 b 2 c 2 d contents of input2.txt ---------------------- 1 e 2 f 3 g
How can i do that?