Maybe I am just stupid but even then I would appreciate some helpful comments.
I do have a dataset containing IDs, articles and article attributes (i.e. groupings). Using RODBC
, I fetch the table from a SQL Server database (with sqlFetch
). Therefore, I couldn't find a way to use read.transactions
from the arules-package. The format is constructed that each row is a ID to article+grouping combination. The table has 22 million rows with roughly 11 million unique IDs. The article base is around 500,000 articles. For illustration purposes, I created an example data frame like this
dat <- data.frame(id <- rep(paste0("id", 1:5),1, each=3),
article <- rep(paste0("A", sample(1051:1069,15,T))),
groupA <- paste0(sample(LETTERS[1:3],15,TRUE), sample(1:3, 15,T))
)
colnames(dat) <- c("id", "article", "groupA")
dat$groupB[dat$groupA == "A1"|dat$groupA == "A2"|dat$groupA =="A3"] <- 1
dat$groupB[dat$groupA == "B1"|dat$groupA == "B2"|dat$groupA =="B3"] <- 2
dat$groupB[dat$groupA == "C1"|dat$groupA == "C2"|dat$groupA =="C3"] <- 3
resulting in a data.frame similar to this
id article groupA groupB
1 id1 A1053 B3 2
2 id1 A1051 B1 2
3 id1 A1067 B2 2
4 id2 A1054 B1 2
5 id2 A1069 B1 2
6 id2 A1068 A3 1
7 id3 A1053 C3 3
8 id3 A1066 B3 2
9 id3 A1057 A2 1
10 id4 A1056 B1 2
11 id4 A1065 C1 3
12 id4 A1069 A3 1
13 id5 A1066 A3 1
14 id5 A1062 A3 1
15 id5 A1059 B2 2
I want to use the arules package for a market basket analysis. However, if I transfer the data.frame to a transactions object with txn <- as(dat, "transactions")
every row is identified as one basket. Is there a possibility to point arules to the id variable as identifier? Basically, I am looking for the "single"
option from the read.transactions
function in the as
-transformation.
A second, related question would be:
Can I use attributes with the arules package? I thought it would be need to perform an analysis with the apriori
algorithm not only on the article level but also on the grouping levels. E.g. if I buy article A1053, I find a rule to recommend groupA B3. Is this even possible?