2
votes

Trying to use arulesSequences packages in R. Running into the problem I've seen a lot of people encounter but no good answers for: going from data-frame or matrix to transaction data type.

I've done this, as the documentation clearly states, for arules:

a_df3 <- data.frame(TID = c(1,1,2,2,2,3), item=c("a","b","a","b","c", "b"))
a_df3
trans4 <- as(split(a_df3[,"item"], a_df3[,"TID"]), "transactions")

Works okay. But if I try to do the same for a 3 column dataframe, everything goes haywire:

a_df4<-data.frame(SEQUENCEID=c("1","1","1","2","2","3","3"),
                  EVENTID=c("1","2","3","1","2","1","2"),
                  ITEM=c("a","b","a","c","a","a","b"))
a_df4
   SEQUENCEID EVENTID ITEM
1    1         1      a
2    1         2      b
3    1         3      a
4    2         1      c
5    2         2      a
6    3         1      a
7    3         2      b

Yes, there are duplicates but this is exactly the point isn't it? (to find frequent sets of sequences).

So, now I coerce like such:

seqt<-as(split(myseq[,"ITEM"],myseq[,"SEQUENCEID"],myseq[,"EVENTID"]),"transactions")

And I get:

Error in asMethod(object) : 
   can not coerce list with transactions with duplicated items

I've been all over the place trying to get thru this simple hurdle:

  1. Changing the order of splits
  2. Changing everything into factors
  3. Changing everything into matrix
  4. Feeding the data frame directly like such into the arules function
  5. Exporting into a .txt, importing as read.transactions
  6. Exporting into a .txt, importing as "basket"
  7. Trying "solutions": here, here, and here (read_baskets is a function?)

All errors are either the one described above or when I don't get any I get a transaction object with two columns, which OF COURSE cannot be read by arulesSequences because it needs three columns: 1) SEQUENCE-ID, EVENT-ID, ITEMS.

I don't think my data base structure could be any clearer. The sequence is "costumer number", the event id would be the purchase number and the items, well, items.

Please any help appreciated including the structures "as()" wants to see so that it does the coercing correctly.

3

3 Answers

2
votes

try this:

trans4 <- as(a_df3[,"item"], "transactions")
trans4@itemsetInfo$sequnceID = a_df3$SEQUENCEID
trans4@itemsetInfo$eventID = a_df3$EVENTID

transSeq = as(trans4, "timedsequences")
0
votes

arules treats transactions as sets not as sequences.

It can detect frequent itemsets but probably not sequences.

Checking for duplicates is a safeguard against using it incorrectly: it ignores multiplicity and sequence, so having more than one item of the same kind is lost information.

The transactions class represents transaction data used for mining itemsets or rules. It is a direct extension of class itemMatrix to store a binary incidence matrix, item labels, and optionally transaction IDs and user IDs.

(from the documentation, emphasis added)

0
votes

Its been a while that this ques was asked, but I'll try to answer it anyways. The error seems to be because there are identical records of the following type

  SEQUENCEID EVENTID ITEM
1    1         1      a
3    1         1      a
4    2         1      c 

This might solve the problem if you check for distinct records before split and converting to transactions.