0
votes

What I am trying to do is edit a transactions list stored as a CSV file to something that can be used by the "arules" package in R. But I also want to keep it as a data frame so I can export it to a different CSV file. So I started with a simple data set:

Fruit   Milk    Eggs
yes   yes     no
no    no      yes
no    yes     yes
yes   yes     yes

It needs to look like this:

Fruit   Milk    
                Eggs
        Milk    Eggs
Fruit   Milk    Eggs

So, I read in the CSV and get the column names:

df1 <- read.csv('basket_test.csv')
l <- c()
#create list with item names
for(i in 1:3){
  print(i)
  l <- append(l,names(df1)[i])
  i=i+1
}

Here's where I'm running into a problem, R sees categorical data, and It complains when I try to change it:

#replace "yes" with item name
for(x in 1:3){
  for(y in 1:4){
    if(df1[y,x]=="yes"){
      df1[y,x] <- l[x]
    }
  }
}

It gave me this error:

invalid factor level, NA generatedinvalid factor level, NA generatedinvalid factor level, NA generatedinvalid factor level, NA generatedinvalid factor level, NA generatedinvalid factor level, NA generatedinvalid factor level, NA generatedinvalid factor level, NA generated

And the data frame now looks like this:

  Fruit Milk Eggs
1  <NA> <NA>   no
2    no  no  <NA>
3   no  <NA> <NA>
4  <NA> <NA> <NA>

I tried as.character on the data frame cells by iterating through each one, then attempting the routine again, but that did not work. So, what do I do to my data frame in order to change the values within it?

Thanks

edit I did find this:

df_fact <- data.frame(lapply(df1,as.factor))
df_trans <- as(df_fact, 'transactions')

from the post here: R-convert transaction format dataset to basket format for Market Basket Analysis

But I am trying to do it myself, and this method doesn't produce something I can store as a CSV.

1

1 Answers

1
votes

You can use mapply in combination with as.data.frame():

df <- read.table(text = "Fruit   Milk    Eggs
yes   yes     no
                 no    no      yes
                 no    yes     yes
                 yes   yes     yes", header = TRUE)

  Fruit Milk Eggs
1   yes  yes   no
2    no   no  yes
3    no  yes  yes
4   yes  yes  yes

df1 <- as.data.frame(mapply(function(x, y){
  ifelse(x == 'yes', y, "")
}, df, names(df)))


  Fruit Milk Eggs
1 Fruit Milk     
2            Eggs
3       Milk Eggs
4 Fruit Milk Eggs

Beware that all three columns will be of class factor, and you may want to convert those with `as.character.

df1[] <- lapply(df1, as.character)