1
votes

I am mining patterns in a dataset that has 1000 transactions of 14 commodities. Each transaction has 0 or 1 in the columns for product based on whether or not that product was purchased. Most of the variables have value 0.

When I am using apriori algorithm on this dataset, the top rules are for the products which are not purchased, like : {var1=0, var2=0,var3=0} => {var4=0} However I am more interested to know which products are being purchased together.

dataset : Trans var1 var2 var3 var4 1 1 0 1 1 2 0 0 0 1 3 0 0 1 0 4 0 0 0 1 5 1 0 1 0 6 1 0 0 0

rules <- apriori(dataset,
 parameter = list(minlen=3, supp=0.002, conf=0.2),
 appearance = list(rhs=c("var1=1","var2=1","var3=1"),
 lhs=c("var1=1","var2=1","var3=1"),
 default="none"),
 control = list(verbose=F))

First thing first, R studio is getting crashed when I try running this. Second point, I am interested to mine run this piece of code as :

rules <- apriori(dataset,
 parameter = list(minlen=3, supp=0.002, conf=0.2),
 appearance = list(rhs!=c("var1=0","var2=0","var3=0"),
 lhs!=c("var1=0","var2=0","var3=0"),
 default="none"),
 control = list(verbose=F))

This is getting errored out!!

Difference : != and 0 instead of 1 So that I get patterns only on items purchased, not on the items which are not being purchased.

Thanks in advance!!

1

1 Answers

1
votes

I was able to find a workaround to solve this problem as :

I changed the dataframe into a matrix and I am no longer getting patterns on items which were not purchsed. Maybe this is the way the algo works, or maybe(hopefully) there is some mistake in my approach.

m <- as.matrix(dataset[,-1]) # removing the transaction id column
names(m) <- paste("Transaction " ,rownames(dataset))
rules.all <- apriori(as(m,"transactions"),parameter = 
                           list(support = 0.1, confidence = 0.8))
inspect(rules.all)
rules.sorted <- sort(rules.all, by="lift")
inspect(rules.sorted)
subset.matrix <- is.subset(rules.sorted, rules.sorted)
subset.matrix[lower.tri(subset.matrix, diag=T)] <- NA
redundant <- colSums(subset.matrix, na.rm=T) >= 1
which(redundant)
plot(rules.all)

Thanks!!