0
votes

I am fairly new to Scala and Spark RDD programming. The dataset I am working with is a CSV file containing list of movies (one row per movie) and their associated user ratings (comma delimited list of ratings). Each column in the CSV represents a distinct user and what rating he/she gave the movie. Thus, user 1's ratings for each movie are represented in the 2nd column from the left:

Sample Input:

Spiderman,1,2,,3,3

Dr.Sleep, 4,4,,,1

I am getting the following error:

Task4.scala:18: error: not enough arguments for method count: (p: ((Int, Int)) => Boolean)Int.
Unspecified value parameter p.
    var moviePairCounts = movieRatings.reduce((movieRating1, movieRating2) => (movieRating1, movieRating2, movieRating1._2.intersect(movieRating2._2).count()

when I execute the few lines below. For the program below, the second line of code splits all values delimited by "," and produces this:

( Spiderman, [[1,0],[2,1],[-1,2],[3,3],[3,4]] ) ( Dr.Sleep, [[4,0],[4,1],[-1,2],[-1,3],[1,4]] )

On the third line, taking the count() throws an error. For each movie (row), I am trying to get the number of common elements. In the above example, [-1, 2] is clearly a common element shared by both Spiderman and Dr.Sleep.

    val textFile = sc.textFile(args(0))
    
    var movieRatings = textFile.map(line => line.split(","))
                                .map(movingRatingList => (movingRatingList(0), movingRatingList.drop(1)
                                .map(ranking => if (ranking.isEmpty) -1 else ranking.toInt).zipWithIndex));
                                

    
    var moviePairCounts = movieRatings.reduce((movieRating1, movieRating2) => (movieRating1, movieRating2, movieRating1._2.intersect(movieRating2._2).count() )).saveAsTextFile(args(1));

My target output of line 3 is as follows:

( Spiderman, Dr.Sleep, 1 ) --> Between these 2 movies, there is 1 common entry.

Can somebody please advise ?

2

2 Answers

1
votes

To get the number of elements in a collection, use length or size. count() returns number of elements which satisfy some additional condition.

Or you could avoid building the complete intersection by using count to count the elements of the first collection which the second contains:

movieRating1._2.count(movieRating2._2.contains(_))
0
votes

The error message seems pretty clear: count takes one argument, but in your call, you are passing an empty argument list, i.e. zero arguments. You need to pass one argument to count.