0
votes

I have a large (snowflake) database of transactions and want to explore them with association rule learning.

Loading the data into memory and using, for instance, R's arules package is not an option because of the memory requirements.

Is there any (R, python) package / sql code to compute association rules (via apriori or FP-growth algorithm) on the database itself?

I know something similar exists for SQL Server (https://www.sqlshack.com/the-association-rule-mining-in-sql-server/)

1
Have you looked into dplyr for Snowflake? snowflake.com/blog/… It has some abilities to push-down to Snowflake, which might help you out. Not sure about association rule learning, though.Mike Walton
I've written a book that has two chapters on this. It is quite doable but a bit lengthy for Stack Overflow.Gordon Linoff
So how is this usually done for large databases when doing it in memory is not an option?harry-plotter

1 Answers

0
votes

There is nothing native in Snowflake.

You could try to adapt e.g. this guys sql implementation: http://sqldatamine.blogspot.com/2014/02/associated-items-using-apriori-algorithm.html?_sm_au_=iVVR1RP6530TJ5SMqCc84K3L6t8Jp

Apriori is a "simple" algorithm so it is possible, but I would recommend to first take a random sample from the data (e.g. 1M transactions) and run apriori with your tool of choice (r, python, knime, ...). If you see that you get interesting results you might be motivated to do that full scale.