0
votes

I have what I thought was a well-prepared dataset. I wanted to use the Apriori Algorithm in R to look for associations and come up with some rules. I have about 16,000 rows (unique customers) and 179 columns that represent various items/categories. The data looks like this:

     Cat1  Cat2  Cat3  Cat4  Cat5 ... Cat179
     1,     0,    0,    0,    1,  ...  0
     0,     0,    0,    0,    0,  ...  1
     0,     1,    1,    0,    0,  ...  0
     ...

I thought having a comma separated file with binary values (1/0) for each customer and category would do the trick, but after I read in the data using:

data5 = read.csv("Z:/CUST_DM/data_test.txt",header = TRUE,sep=",")

and then run this command:

rules = apriori(data5, parameter = list(supp = .001,conf = 0.8))

I get the following error:

Error in asMethod(object):
column(s) 1, 2, 3, ...178 not logical or a factor. Discretize the columns first.  

I understand Discretize but not in this context I guess. Everything is a 1 or 0. I've even changed the data from INT to CHAR and received the same error. I also had the customer ID (unique) as column 1 but I understand that isn't necessary when the data is in this form (flat file). I'm sure there is something obvious I'm missing - I'm new to R.

What am I missing? Thanks for your input.

2
It's really not possible to help you without a reproducible example. It sounds like there's a problem with your data but without being able to reproduce the problem, we can't say what's wrong for sure.MrFlick
Fair enough. Can you tell me this, is the file format of 1's and 0's, comma separated an acceptable format for apriori? And do I need a unique ID column - I understand I do not once it is in flat file format? The answer to those two question will eliminate a few potential problems I think. Thanks.CalData
I solved the problem this way: After reading in the data to R I used lapply() to change the data to factors (I think that's what it does). Then I took that data set and created a data frame from it. Then I was able to apply apriori() successfully.CalData

2 Answers

0
votes

I solved the problem this way: After reading in the data to R I used lapply() to change the data to factors (I think that's what it does). Then I took that data set and created a data frame from it. Then I was able to apply apriori() successfully.

0
votes

Your data is actually already in (dense) matrix format, but read.csv always reads data in as a data.frame. Just coerce the data to a matrix first:

dat <- as.matrix(data5)
rules <- apriori(dat, parameter = list(supp = .001,conf = 0.8))

1s in the data will be interpreted as the presence of the item and 0s as the absence. More information about how to create transactions can be found in the manual page ? transactions.