1
votes

I'm using pySpark MLlib and the method of ALS from the box for collaborative filtering. Just wondering, does Spark provide some other methods of doing filtering (for calculating distance), for example Pearson's or Cosine's? Can they be done in Spark environment?

Many thanks!

1

1 Answers

1
votes

Yes Spark has an implementation of Cosine similarity.

https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/mllib/CosineSimilarity.scala

Example in scala

    // Load and parse the data file.
    val rows = sc.textFile(params.inputFile).map { line =>
      val values = line.split(' ').map(_.toDouble)
      Vectors.dense(values)
    }.cache()
    val mat = new RowMatrix(rows)
    val exact = mat.columnSimilarities()