0
votes

I have an excel file with 7 columns. The first three are numerical and columns 4-7 are categorical. I saved this into a txt file and loaded it into R (using RStudio, clicking the "import dataset" button in the environment) which launched the following command

data <- read.table("~/csectiondata.txt", quote="\"", comment.char="")*

Now, i have been trying to use apriori(data) and this is the error i get:

Error in asMethod(object) : column(s) 1, 2, 3 not logical or a factor. Use as.factor, as.logical or categorize first.

I read that using sapply and as.factor would help, so i did this:

data <- sapply(data, as.factor)*

but now i'm getting this error:

Error in t(as(from, "ngCMatrix")) : error in evaluating the argument 'x' in selecting a method for function 't': Error in asMethod(object) : cannot coerce 'NA's to "nsparseMatrix"

I've tried transaction_data <- as(data, "transactions") as well and i'm getting the same.

"Error in asMethod(object)"

I'm totally lost. can someone help me out?

1
try adding some of your data to the question. Try dput(head(data)). It is much easier to help if we can track down the problem ourselvesjeremycg

1 Answers

0
votes

You need to prepare your data first. Association rule mining can only use items and does not work with continuous variables.

For example, an item describing a person (i.e., the considered object called a transaction) could be tall. The fact that the person is tall would be encoded in the transaction containing the item tall. This is typically encoded in a transaction-by-items matrix by a TRUE value. This is why as.transaction can deal with logical columns, because it assumes the column stands for an item. The function also can convert columns with nominal values (i.e., factors) into a series of binary items (one for each level). So if you have nominal variables then you need to make sure they are factors (and not characters or numbers) using something like data[,"a_nominal_var"] <- factor(data[,"a_nominal_var"]).

Continuous variables need to be discretized first. An item resulting from discretization might be age>18 and the column contains only TRUE or FALSE. Alternatively it can be a factor with levels age<=18, 50=>age>18 and age>50. These will be automatically converted into 3 items, one for each level. Have a look at the function discretize() in arules.