2
votes

I am currently using the arules package to perform a market basket analysis. My data that I read in looks like this (but with many more rows):

>data
  transaction_id  item
1              1  beer
2              1  beer
3              1  soda
4              2  beer
5              3  beer
6              3  fries
7              3  candy
8              4  soda
9              4  fries

I then transform it using dcast and remove the transaction id column:

> Trans_Table <- dcast(data, transaction_id ~ item)
> Trans_Table$transaction_id <- NULL

and it looks like this:

  beer candy fries soda
1    2     0     0    1
2    1     0     0    0
3    1     1     1    0
4    0     0     1    1

but then when I make it into the "transactions" class so I can use the apriori function, it converts the 2 under beer to a 1

> Transactions <-  as(as.matrix(Trans_Table), "transactions")
Warning message:
In asMethod(object) :
  matrix contains values other than 0 and 1! Setting all entries != 0 to 1.

Is there any way to perform the market basket analysis and maintain that 2? In other words, I would like to see rules for {beer} => {beer}, {beer, beer} => {soda}, and {beer, soda} => {beer} but it is currently only counting beer once per each transaction even if it was purchased twice.

Can anyone help out with this?

1

1 Answers

3
votes

Market basket analysis is look at distinct items purchased together, and not frequency of a given item. But, if you really want to treat same item as if it were distinct, you can perhaps use the following approach to generate new item names.

Using library dplyr, you can mutate the item name to be appended by the an id of times it occurs, and use that in your arules processing:

library(dplyr)
df <- df %>%
        group_by(transaction_id, item) %>%
        mutate(newitem = paste(item, row_number(), sep = ''))
as.matrix(table(df$transaction_id, df$newitem))

Output is:

    beer1 beer2 candy1 fries1 soda1
  1     1     1      0      0     1
  2     1     0      0      0     0
  3     1     0      1      1     0
  4     0     0      0      1     1

There are a couple of ways to tweak the output to fit the specific format style too.