0
votes

The awesome R package recommenderlab written by Prof. Michael Hahsler provides a recommender model based on association rules derived from his another R package arules.

The minimum example code adapted from the documentation of recommenderlab can be found in another post here.

The AR recommender model learned can be used to make prediction/recommendation given a userid.

pred <- predict(rec, dat[1:5,])
 as(pred, "list")
   [[1]]
   [1] "whole milk"     "rolls/buns"     "tropical fruit"

   [[2]]
   [1] "whole milk"

   [[3]]
   character(0)

   [[4]]
   [1] "yogurt"        "whole milk"    "cream cheese " "soda"         

   [[5]]
   [1] "whole milk"

I understood that the prediction is basically to first find all matching LHS from the set of rules (R) mined from the training dataset. And then recommend N unique RHS of the matching rules with the highest support/confidence/lift score.

So my question is how do you get the matching LHS rules for prediction?

From the source code we can see

m <- is.subset(lhs(model$rule_base), newdata@data)
for(i in 1:nrow(newdata)) {
      recom <- head(unique(unlist(
        LIST(rhs(sort(model$rule_base[m[,i]], by=sort_measure)),
          decode=FALSE))), n)

      reclist[[i]] <- if(!is.null(recom)) recom else integer(0)
    }

I managed to access the rule_base from the trained model via

rule_base <- getModel(rec)$rule_base

but then here comes another concern, why do head(unique(unlist(LIST(rhs(sort(model$rule_base[m[,i]], by=sort_measure)), decode=FALSE))), n) but not first group by the rhs and then aggregate the sort_measure and the lhs before sorting?

1

1 Answers

1
votes

head(unique(unlist(LIST(rhs(sort(model$rule_base[m[,i]], by=sort_measure)), decode=FALSE))), n) takes all rules with matching LHS, sorts them by the measure, and then returns the n unique RHS items with the highest measure.

I guess you are thinking about aggregating the measure if there are several matching rules with the same RHS in the rule base. I thought about this as well but then decided to use the first-match strategy. The main reason was the way association rules/frequent itemsets are created. You will find for each longer rule many shorter rules with the same RHS and thus aggregating the measure by addition did not make too much sense to me.