2
votes

I have file of the form User, Item, which I'd like to use with Spark Itemsets. I've done this:

val data = sc.textFile("myfile")
             .map(line => (line.trim.split(' ')(0), line.trim.split(' ')(1)))
             .groupByKey()
val fpg = new FPGrowth().setMinSupport(0.2).setNumPartitions(10)
val model = fpg.run(data)

but it is complaining that

inferred type arguments [Nothing,(String, Iterable[String])] do not conform to method run's type parameter bounds [Item,Basket <: Iterable[Item]]

1

1 Answers

1
votes

Basket has to be java.lang.Iterable so neither Tuple2 or Scala Iterable won't work here. Just drop keys and convert baskets to Array before you pass data to run method:

val data = sc.parallelize(Seq("1 a", "1 b", "2 b", "2 c"))
  .map(_.split(" ") match {
    case Array(id, item, _*) => (id, item)
  })
  .groupByKey()
  .values  // Take only values
  .map(_.toArray)  // Convert to Array

val fpg = new FPGrowth().setMinSupport(0.2).setNumPartitions(10)
val model = fpg.run(data)