3
votes

I want to extract association rules for a set of transaction with following code Spark-Scala:

val fpg = new FPGrowth().setMinSupport(minSupport).setNumPartitions(10)
val model = fpg.run(transactions)
model.generateAssociationRules(minConfidence).collect()

however the number of products are more than 10K so extracting the rules for all combination is computationally expressive and also I do not need them all. So I want to extract only pair wise:

Product 1 ==> Product 2
Product 1 ==> Product 3
Product 3 ==> Product 1

and I do not care about other combination such as:

[Product 1] ==> [Product 2, Product 3]
[Product 3,Product 1] ==> Product 2

Is there any way to do that?

Thanks, Amir

1
BTW, I am doing it in Spark-ScalaAmir

1 Answers

4
votes

Assuming your transactions look more or less like this:

val transactions = sc.parallelize(Seq(
  Array("a", "b", "e"),
  Array("c", "b", "e", "f"),
  Array("a", "b", "c"),
  Array("c", "e", "f"),
  Array("d", "e", "f")
))

you can try to generate frequent itemsets manually and apply AssociationRules directly:

import org.apache.spark.mllib.fpm.AssociationRules
import org.apache.spark.mllib.fpm.FPGrowth.FreqItemset

val freqItemsets = transactions
  .flatMap(xs => 
    (xs.combinations(1) ++ xs.combinations(2)).map(x => (x.toList, 1L))
  )
  .reduceByKey(_ + _)
  .map{case (xs, cnt) => new FreqItemset(xs.toArray, cnt)}

val ar = new AssociationRules()
  .setMinConfidence(0.8)

val results = ar.run(freqItemsets)

Notes:

  • unfortunately you'll have to handle filtering by support manually. It can be done by applying filter on freqItemsets
  • you should consider increasing number of partitions before flatMap
  • if freqItemsets is to large to be handled you can split freqItemsets into few steps to mimic actual FP-growth:

    1. generate 1-patterns and filter by support
    2. generate 2-patterns using only frequent patterns from step 1