R: Apriori Algorithm does not find any association rules

Question

I generated a dataset holding two distinct columns: an ID column associated to a customer and another column associated to his/her active products:

head(df_itemList)

      ID      PRD_LISTE
1     1       A,B,C
3     2       C,D
4     3       A,B
5     4       A,B,C,D,E
7     5       B,A,D
8     6       A,C,D

I only selected customers that own more than one product. In total I have 589.454 rows and there are 16 different products.

Next, I wrote the data.frame into an csv-file like this:

df_itemList$ID <- NULL
colnames(df_itemList) <- c("itemList")
write.csv(df_itemList, "Basket_List_13-08-2020.csv", row.names = TRUE)

Then, I converted the csv-file into a basket format in order to apply the apriori algorithm as implemented in the arules-package.

library(arules)  
txn <- read.transactions(file="Basket_List_13-08-2020.csv", 
                         rm.duplicates= TRUE, format="basket",sep=",",cols=1)
txn@itemInfo$labels <- gsub("\"","",txn@itemInfo$labels)

The summary-function yields the following output:

summary(txn)
transactions as itemMatrix in sparse format with
 589455 rows (elements/itemsets/transactions) and
 1737 columns (items) and a density of 0.0005757052 

most frequent items:
                   A,C                    A,B                     C,F                     C,D
                  57894                   32150                   31367                   29434 
                  A,B,C                 (Other) 
                  29035                  409575 

element (itemset/transaction) length distribution:
sizes
     1 
589455 

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      1       1       1       1       1       1 

includes extended item information - examples:
                                                                             labels
1 G,H,I,A,B,C,D,F,J
2 G,H,I,A,B,C,F
3 G,H,I,A,B,K,D

includes extended transaction information - examples:
  transactionID
1              
2             1
3             3

Now, I tried to run the apriori-algorithm:

basket_rules <- apriori(txn, parameter = list(sup = 1e-15, 
                                              conf = 1e-15, minlen = 2, target="rules"))

This is the output:

   Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen maxlen target  ext
       0.01    0.1    1 none FALSE            TRUE       5   1e-15      2     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 0 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[1737 item(s), 589455 transaction(s)] done [0.20s].
sorting and recoding items ... [1737 item(s)] done [0.00s].
creating transaction tree ... done [0.16s].
checking subsets of size 1 done [0.00s].
writing ... [0 rule(s)] done [0.00s].
creating S4 object  ... done [0.04s].

Even with a ridiculously low support and confidence, no rules are generated...

summary(basket_rules)
set of 0 rules

Is this really because of my dataset? Or was there a mistake in my code?

Michael Hahsler Michael Hahsler · Accepted Answer · 2020-08-14T18:15:45

Your summary shows that the data is not read in correctly:

most frequent items:
                   A,C                    A,B                     C,F                     C,D
                  57894                   32150                   31367                   29434 
                  A,B,C                 (Other) 
                  29035                  409575

Looks like "A,C" is read as an item, but it should be two items "A" and "C". The separating character does not work. I assume that could be because of quotation marks in the file. Make sure that Basket_List_13-08-2020.csv looks correct. Also, you need to skip the first line (headers) using skip = 1 when you read the transactions.

R: Apriori Algorithm does not find any association rules

2 Answers