Trying to use arulesSequences
packages in R. Running into the problem I've seen a lot of people encounter but no good answers for: going from data-frame or matrix to transaction data type.
I've done this, as the documentation clearly states, for arules:
a_df3 <- data.frame(TID = c(1,1,2,2,2,3), item=c("a","b","a","b","c", "b"))
a_df3
trans4 <- as(split(a_df3[,"item"], a_df3[,"TID"]), "transactions")
Works okay. But if I try to do the same for a 3 column dataframe, everything goes haywire:
a_df4<-data.frame(SEQUENCEID=c("1","1","1","2","2","3","3"),
EVENTID=c("1","2","3","1","2","1","2"),
ITEM=c("a","b","a","c","a","a","b"))
a_df4
SEQUENCEID EVENTID ITEM
1 1 1 a
2 1 2 b
3 1 3 a
4 2 1 c
5 2 2 a
6 3 1 a
7 3 2 b
Yes, there are duplicates but this is exactly the point isn't it? (to find frequent sets of sequences).
So, now I coerce like such:
seqt<-as(split(myseq[,"ITEM"],myseq[,"SEQUENCEID"],myseq[,"EVENTID"]),"transactions")
And I get:
Error in asMethod(object) :
can not coerce list with transactions with duplicated items
I've been all over the place trying to get thru this simple hurdle:
- Changing the order of splits
- Changing everything into factors
- Changing everything into matrix
- Feeding the data frame directly like such into the arules function
- Exporting into a .txt, importing as read.transactions
- Exporting into a .txt, importing as "basket"
- Trying "solutions": here, here, and here (read_baskets is a function?)
All errors are either the one described above or when I don't get any I get a transaction object with two columns, which OF COURSE cannot be read by arulesSequences
because it needs three columns: 1) SEQUENCE-ID, EVENT-ID, ITEMS.
I don't think my data base structure could be any clearer. The sequence is "costumer number", the event id would be the purchase number and the items, well, items.
Please any help appreciated including the structures "as()" wants to see so that it does the coercing correctly.