2
votes

I would like to use Apriori to carry out affinity analysis on transaction data. I have a table with a list of orders and their information. I mainly need to use the OrderID and ProductID attributes which are in the following format

OrderID ProductID
1 A
1 B
1 C
2 A
2 C
3 A

Weka requires you to create a nominal attribute for every product ID and to specify whether the item is present in the order using a true or false value like like this:

1, TRUE, TRUE, TRUE
2, TRUE, FALSE, TRUE
3, TRUE, FALSE, FALSE

My dataset contains about 10k records... about 3k different products. Can anyone suggest a way to create the dataset in this format? (Besides a manually time consuming way...)

3

3 Answers

0
votes

How about writing a script to convert it?

Should be less than 10 lines in a good scripting language such as Python.

Or you may look into options of pivoting the relation as desired.

Either way, it is a straight forward programming task, so I don't see your question here.

0
votes

You obviously need to convert your data. Easiest way: write a software that read the file in the programming language that you are most familiar with and then write the file in the appropriate format. Since it is text files, it should not be too complicated.

By the way, if you want more algorithms for pattern mining and association mining than just Apriori in Weka, you could check my software SPMF ( http://www.philippe-fournier-viger.com/spmf/ ) which is also in Java, can read ARFF files too and offers about 50 algorithms specialized in pattern mining (Apriori FPGrowth, and many others.

0
votes

Your data is formatted correctly as-is for implementation in R using the ARULES package (and apriori function). You might consider checking it out, esp. if you're not able to get into script coding.