say that I have a text file called 1.txt and 2.txt. 1.txt contains
1,9,5
2,7,4
3,8,3
and 2.txt contains
1,g,h
2,i,j
3,k,l
So, i joined the two by their keys(first column):
val one = sc.textFile("1.txt").map{
line => val parts = line.split(",",-1)
(parts(0),(parts(1),parts(2)))
}
val one = sc.textFile("2.txt").map{
line => val parts = line.split(",",-1)
(parts(0),(parts(1),parts(2)))
}
Now, if I understand this correctly, I'm getting
(1, ( (9,5), (g,h) ))
(2, ( (7,4), (i,j) ))
(3, ( (8,3), (k,l) ))
Now, say that I need to sum up all the values of the second column of 1.txt,
how do i do this?
How do I refer to the second column of 2.txt(i.e. g,i,k) in the joined RDD?
Is there any good tutorial for working with RDD ? I'm a spark(and scala) newbie.