Does the Spark MLlib implementation of alternating least squares (http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html) require that all zero entries for the training set (user-product combinations where the user has no history of interacting with the product) are manually created with a rating of 0, or will the algorithm automatically imply that all missing combinations have a zero rating?
1 Answers
3
votes
The training set can be sparse, and in fact, should be -- otherwise you'll pay a (possibly severe) performance penalty. See this discussion on the spark users mailing list for more information.