0
votes

I am trying to come up with an algorithm to find top-3 most frequently used adjectives for the product in the same sentence. I want to use association rule mining(Apriori algorithm).

For that I am planning of using the twitter data. I can more or less decompose twits in to sentences and then with filtering I can find product names and adjectives with it.

For instance, after filtering I have data like;

ipad mini, great

ipad mini, horrible

samsung galaxy s2, best

... etc.

Product names and adjectives are previously defined. So I have a set of product names and set of adjectives that I am looking for.

I have read couple of papers about sentimental analysis and rule mining and they all say Apriori algorithm is used. But they don't say how they used it and they don't give details.

Therefore how can I reduce my problem to association rule mining problem? 
What values should I  use for minsup and minconf? 
How can I modify Apriori algorithm to solve this problem?

What I' m thinking is;

I should find frequent adjectives separately for each product. Then by sorting I can get top-3 adjectives. But I do not know if it is correct.

2

2 Answers

1
votes

Finding the top-3 most used adjectives for each product is not association rule mining.

For Apriori to yield good results, you must be interested in itemsets of length 4 and more. Apriori pruning starts at length 3, and begins to yield major gains at length 4. At length 2, it is mostly enumerating all pairs. And if you are only interested in pairs (product, adjective), then apriori is doing much more work than necessary.

Instead, use counting. Use hash tables. If you really have Exabytes of data, use approximate counting and heavy hitter algorithms. (But most likely, you don't have exabytes of data after extracting those pairs...)

Don't bother to investigate association rule mining if you only need to solve this much simpler problem.

Association rule mining is really only for finding patterns such as

pasta, tomato, onion -> basil

and more complex rules. The contribution of Apriori is to reduce the number of candidates when going from length n-1 -> n for length n > 2. And it gets more effective when n > 3.

0
votes

Reducing your problem to Association Rule Mining (ARM)

Create a feature vector having all the topics and adjectives. If a feed contains topic then place 1 for it else 0 in tuple. For eg. Let us assume Topics are Samsung and Apple. And Adjectives are good and horrible. And feed contains Samsung good. Then corresponding tuple for it is :

Samsung Apple good horrible

1 0 1 0

Modification to Apriori Algorithm required

generate Association Rules of type 'topic' --> 'adjective' using constrained apriori algorithm. 'topic' --> 'adjective' is a constraint.

How to set MinSup and MinConf : Read a paper entitled "Minin top-k association rules". Implement that with k=3 for 3 top adjectives.