2
votes

I have dataset is history of purchase like this:

+---+-----------+---------+
|usn|    page_id|    click|
+---+-----------+---------+
| 11| 9000001012|       10|
|169| 2010008901|      100|
|169| 9000001007|        4|
|169| 2010788901|        1|
|169| 8750001007|        4|
|169| 9003601012|       10|
|169| 9000001007|        4|
|613| 9000050601|        8|
|613| 9000011875|        3|
|613| 2010010401|        6|
|613| 9000001007|        4|
|613| 2010008801|        1|
|836| 9000050601|       20|
|916| 9000050601|       10|
|916| 9000562601|       30|
|916| 9000001007|        4|
|916| 9000001012|       10|
+---+-----------+---------+

I have been read docs in Spark (http://spark.apache.org/docs/latest/ml-collaborative-filtering.html) but i don't know how to use Collaborative Filtering for Implicit Preference in this problem.

And now i want to apply ALS for Implicit Preference to this dataset. How to do it? Can I apply this dataset for Explicit Data?

Please help me use it and Give me an example code python about Implicit Preference if you have

1
Where do you get stuck? - mtoto
Do i assume click as ratings? and Can i put this dataset to model ALS? That is true? And i need example for this: als = ALS(maxIter=5, regParam=0.01, implicitPrefs=True, userCol="userId", itemCol="movieId", ratingCol="rating") with implicitPrefs=True - Phong Nguyen
Yes they act as a rating. Just replace the args with the correct colnames of your dataset. If the results are way off, you can try normalizing the clicks. - mtoto
so, Can you give me example code python about Implicit feedback - Phong Nguyen
why do you need an example? you already have the code, just run it and see if it works. - mtoto

1 Answers

1
votes

A little late my answer, but the main thing is to scale the values de 'click'. In my case work:

from pyspark.sql import Window

ww = Window.partitionBy("usn")
scaled_score = (
    0.00001 + 10*(col("click") - min("click").over(ww)) / (max("click").over(ww) - min("click").over(ww))
).cast(DecimalType(7, 5))

After creating a strategy for the most visiteds page_id, remember that the values to be modeled should reflect the client's tastes