I am fairly new to Scala and Spark RDD programming. The dataset I am working with is a CSV file containing list of movies (one row per movie) and their associated user ratings (comma delimited list of ratings). Each column in the CSV represents a distinct user and what rating he/she gave the movie. Thus, user 1's ratings for each movie are represented in the 2nd column from the left:
Sample Input:
Spiderman,1,2,,3,3
Dr.Sleep, 4,4,,,1
I am getting the following error:
Task4.scala:18: error: not enough arguments for method count: (p: ((Int, Int)) => Boolean)Int.
Unspecified value parameter p.
var moviePairCounts = movieRatings.reduce((movieRating1, movieRating2) => (movieRating1, movieRating2, movieRating1._2.intersect(movieRating2._2).count()
when I execute the few lines below. For the program below, the second line of code splits all values delimited by "," and produces this:
( Spiderman, [[1,0],[2,1],[-1,2],[3,3],[3,4]] ) ( Dr.Sleep, [[4,0],[4,1],[-1,2],[-1,3],[1,4]] )
On the third line, taking the count() throws an error. For each movie (row), I am trying to get the number of common elements. In the above example, [-1, 2] is clearly a common element shared by both Spiderman and Dr.Sleep.
val textFile = sc.textFile(args(0))
var movieRatings = textFile.map(line => line.split(","))
.map(movingRatingList => (movingRatingList(0), movingRatingList.drop(1)
.map(ranking => if (ranking.isEmpty) -1 else ranking.toInt).zipWithIndex));
var moviePairCounts = movieRatings.reduce((movieRating1, movieRating2) => (movieRating1, movieRating2, movieRating1._2.intersect(movieRating2._2).count() )).saveAsTextFile(args(1));
My target output of line 3 is as follows:
( Spiderman, Dr.Sleep, 1 ) --> Between these 2 movies, there is 1 common entry.
Can somebody please advise ?