1
votes

I am having real trouble getting my data to produce any rules using the arules package. I have managed to get 100000 rows of transaction data and in SAS the rules are shown. I cannot get it to work in R.

[5]      {19,29,40,119,134}   
[6]      {24,40,45,67,141}    
[7]      {17,18,57,74,412}    
[8]      {16,79,90,150,498}   
[9]      {18,57,111,161,267}  
[10]     {11,75,131,427,429}  
[11]     {57,99,111,143,236} 

The transactions data looks like this and originally came from a table where all the numbers were separate.

arules <- read.transactions('tid.csv', format = c("basket", "single"), 
sep=",")
rules <- apriori(arules,parameter = list(supp = 0.1, conf = 0.1, target = 
"rules"))
summary(rules)

For reference the supports and confidence settings make no difference. Sometimes I get this when I inspect the rules.

         lhs    rhs                   support      confidence   lift count
[1]      {}  => {8,11,96,112,432}     9.710623e-06 9.710623e-06 1    1    
[2]      {}  => {62,134,222,254,412}  9.710623e-06 9.710623e-06 1    1 

Any idea why apriori can't separate the items in the transaction? Does this need to be recast into long format and if so how would I do that form this data frame?

V2  V3  V4  V5  V6
8   11  96  112 432
10  35  39  76  119
18  38  68  141 267
29  36  57  61  63
19  29  40  119 134
24  40  45  67  141
17  18  57  74  412
1
It is hard to say without having the csv file you are reading from. Please post the file.Michael Hahsler

1 Answers

0
votes

If I understood you correctly then you should try this and let us know if it helped.

library(arules)
library(arulesViz)

#sample data
df <- read.table(text="V2  V3  V4  V5  V6
                 8   11  96  112 432
                 10  35  39  76  119
                 18  38  68  141 267
                 29  36  57  61  63
                 19  29  40  119 134
                 24  40  45  67  141
                 17  18  57  74  412", header=T)
write.csv(df, "apriori_demo.csv", row.names = F)

#convert sample data into transactions format for apriori algorithm
trx <- read.transactions("apriori_demo.csv", format="basket", sep=",", skip=1)

#apriori rules
apriori_rule <- apriori(trx, parameter = list(supp = 0.1, conf = 0.1)) 
#obviously you need to have better parameters compared to the one you have used in your post!
inspect(apriori_rule)
plot(apriori_rule, method="graph")