
I'm getting in trouble transforming a dataframe object into a transaction object. I create a dataframe grouped by InvoiceNumber and the list of products separated by ',' (the dataframe then contains two columns), everything is ok,

df = read.csv('Orders.csv', sep = ';', stringsAsFactors = T)
    df$Document.Date = as.Date(df$Document.Date, format = '%d/%m/%Y')


    grouping_for_AA =
            df %>%
            group_by(Sales.Document,  Material) %>%
            dplyr::select(Sales.Document, Material, Document.Date)

#Create transaction data building a list of material for each sales doc
#separated by a ,
transactionData = ddply(grouping_for_AA, c('Sales.Document'),
                        function(df) paste(df$Material,
                        collapse = ',')

but when I use the as(data, 'transactions') function R say me to discretize input, so I use as.factor for the Product list column, but doing this each transaction becomes a factor level and no rules can be mined (clearly).

#set column InvoiceNo of dataframe transactionData  
transactionData$Sales.Document <- NULL
#Change name of lists of Materials
colnames(transactionData) = 'Material'

#transform to factor
transactionData = data.frame(lapply(transactionData, factor))

#Create a transaction object: errors can be due to the package containing 'as'
trObj <- as(transactionData, "transactions")

I already tried dataframes in single and basket format, but I could not solve it.

Any Idea on how to transform a dataframe into transaction format without exporting and reloading data?

Yes, but it's much hard without having some of your data. Also fakes data are ok if you cannot publish yours. Can you post them?s__

1 Answers


You can try this, to convert your data.frame in a transaction dataset. I've added a fake date, but I think it's useless, due you are not using it in your elaboration:

data$Document.Date <- Sys.Date()
  Sales.Document Material Document.Date
1              1        A    2018-11-21
2              1        B    2018-11-21
3              1        C    2018-11-21
4              2        A    2018-11-21
5              2        C    2018-11-21
6              3        A    2018-11-21

Now exactly your dataset: you can add data.frame() in the dplyr chain:

grouping_for_AA <- data %>%
                   group_by(Sales.Document,  Material) %>%
                   dplyr::select(Sales.Document, Material, Document.Date) %>%

Now you can transform in a transactions data:

trans <- as(split(grouping_for_AA[,"Material"], grouping_for_AA[,"Sales.Document"]), "transactions")

    items   transactionID
[1] {A,B,C} 1            
[2] {A,C}   2            
[3] {A}     3    

Lastly, you can apply the apriori() function:

rules <- apriori(trans, parameter = list(supp = 0.3, conf = 0.3, target="rules", minlen=2)) 
    lhs      rhs support   confidence lift count
[1] {B}   => {C} 0.3333333 1.0000000  1.5  1    
[2] {C}   => {B} 0.3333333 0.5000000  1.5  1    
[3] {B}   => {A} 0.3333333 1.0000000  1.0  1    
[4] {A}   => {B} 0.3333333 0.3333333  1.0  1    
[5] {C}   => {A} 0.6666667 1.0000000  1.0  2    
[6] {A}   => {C} 0.6666667 0.6666667  1.0  2    
[7] {B,C} => {A} 0.3333333 1.0000000  1.0  1    
[8] {A,B} => {C} 0.3333333 1.0000000  1.5  1    
[9] {A,C} => {B} 0.3333333 0.5000000  1.5  1