What I am trying to do is edit a transactions list stored as a CSV file to something that can be used by the "arules" package in R. But I also want to keep it as a data frame so I can export it to a different CSV file. So I started with a simple data set:
Fruit Milk Eggs
yes yes no
no no yes
no yes yes
yes yes yes
It needs to look like this:
Fruit Milk
Eggs
Milk Eggs
Fruit Milk Eggs
So, I read in the CSV and get the column names:
df1 <- read.csv('basket_test.csv')
l <- c()
#create list with item names
for(i in 1:3){
print(i)
l <- append(l,names(df1)[i])
i=i+1
}
Here's where I'm running into a problem, R sees categorical data, and It complains when I try to change it:
#replace "yes" with item name
for(x in 1:3){
for(y in 1:4){
if(df1[y,x]=="yes"){
df1[y,x] <- l[x]
}
}
}
It gave me this error:
invalid factor level, NA generatedinvalid factor level, NA generatedinvalid factor level, NA generatedinvalid factor level, NA generatedinvalid factor level, NA generatedinvalid factor level, NA generatedinvalid factor level, NA generatedinvalid factor level, NA generated
And the data frame now looks like this:
Fruit Milk Eggs
1 <NA> <NA> no
2 no no <NA>
3 no <NA> <NA>
4 <NA> <NA> <NA>
I tried as.character
on the data frame cells by iterating through each one, then attempting the routine again, but that did not work. So, what do I do to my data frame in order to change the values within it?
Thanks
edit I did find this:
df_fact <- data.frame(lapply(df1,as.factor))
df_trans <- as(df_fact, 'transactions')
from the post here: R-convert transaction format dataset to basket format for Market Basket Analysis
But I am trying to do it myself, and this method doesn't produce something I can store as a CSV.