1
votes

I'm getting in trouble transforming a dataframe object into a transaction object. I create a dataframe grouped by InvoiceNumber and the list of products separated by ',' (the dataframe then contains two columns), everything is ok,

df = read.csv('Orders.csv', sep = ';', stringsAsFactors = T)
    df$Document.Date = as.Date(df$Document.Date, format = '%d/%m/%Y')

    library(tidyverse)
    library(plyr)

    grouping_for_AA =
        data.frame(
            df %>%
            group_by(Sales.Document,  Material) %>%
            dplyr::select(Sales.Document, Material, Document.Date)
        )


#Create transaction data building a list of material for each sales doc
#separated by a ,
transactionData = ddply(grouping_for_AA, c('Sales.Document'),
                        function(df) paste(df$Material,
                        collapse = ',')
                        )

but when I use the as(data, 'transactions') function R say me to discretize input, so I use as.factor for the Product list column, but doing this each transaction becomes a factor level and no rules can be mined (clearly).

#set column InvoiceNo of dataframe transactionData  
transactionData$Sales.Document <- NULL
#Change name of lists of Materials
colnames(transactionData) = 'Material'

#transform to factor
transactionData = data.frame(lapply(transactionData, factor))


#Create a transaction object: errors can be due to the package containing 'as'
trObj <- as(transactionData, "transactions")

I already tried dataframes in single and basket format, but I could not solve it.

Any Idea on how to transform a dataframe into transaction format without exporting and reloading data?

1
Yes, but it's much hard without having some of your data. Also fakes data are ok if you cannot publish yours. Can you post them?s__

1 Answers

1
votes

You can try this, to convert your data.frame in a transaction dataset. I've added a fake date, but I think it's useless, due you are not using it in your elaboration:

data$Document.Date <- Sys.Date()
data
  Sales.Document Material Document.Date
1              1        A    2018-11-21
2              1        B    2018-11-21
3              1        C    2018-11-21
4              2        A    2018-11-21
5              2        C    2018-11-21
6              3        A    2018-11-21

Now exactly your dataset: you can add data.frame() in the dplyr chain:

library(tidyverse)
library(plyr)
grouping_for_AA <- data %>%
                   group_by(Sales.Document,  Material) %>%
                   dplyr::select(Sales.Document, Material, Document.Date) %>%
                   data.frame()

Now you can transform in a transactions data:

library(arules)
library(reshape2)
trans <- as(split(grouping_for_AA[,"Material"], grouping_for_AA[,"Sales.Document"]), "transactions")

inspect(trans)
    items   transactionID
[1] {A,B,C} 1            
[2] {A,C}   2            
[3] {A}     3    

Lastly, you can apply the apriori() function:

rules <- apriori(trans, parameter = list(supp = 0.3, conf = 0.3, target="rules", minlen=2)) 
inspect(rules)
    lhs      rhs support   confidence lift count
[1] {B}   => {C} 0.3333333 1.0000000  1.5  1    
[2] {C}   => {B} 0.3333333 0.5000000  1.5  1    
[3] {B}   => {A} 0.3333333 1.0000000  1.0  1    
[4] {A}   => {B} 0.3333333 0.3333333  1.0  1    
[5] {C}   => {A} 0.6666667 1.0000000  1.0  2    
[6] {A}   => {C} 0.6666667 0.6666667  1.0  2    
[7] {B,C} => {A} 0.3333333 1.0000000  1.0  1    
[8] {A,B} => {C} 0.3333333 1.0000000  1.5  1    
[9] {A,C} => {B} 0.3333333 0.5000000  1.5  1