2
votes

User Régis Jean-Gilles gracefully answered my previous question where I was struggling with CanBuildFrom and enrichment functions (aka "pimp my library" or "enrich my library"):

Creating an implicit function that wraps map() in Scala with the right type: Not for the faint at heart

But this time I've got an even more complicated issue.

I have a function to implement variations on intersectWith, for intersecting collections by their keys. I've managed to make them work like proper collection functions:

implicit class IntersectUnionWithPimp[K, A, Repr](a: GenTraversableLike[(K, A), Repr]) {

  /**
   * Intersect two collections by their keys. This is identical to
   * `intersectWith` except that the combiner function is passed the
   * key as well as the two items to combine.
   *
   * @param b Other collection to intersect with.
   * @param combine Function to combine values from the two collections.
   */
  def intersectWithKey[B, R, That](b: GenTraversable[(K, B)])(
      combine: (K, A, B) => R)(
      implicit bf: CanBuildFrom[Repr, (K, R), That]): That = {
    ...
  }

  /**
   * Intersect two collections by their keys. Keep the ordering of
   * objects in the first collection. Use a combiner function to
   * combine items in common. If either item is a multi-map, then
   * for a key seen `n` times in the first collection and `m`
   * times in the second collection, it will occur `n * m` times
   * in the resulting collection, including all the possible
   * combinations of pairs of identical keys in the two collections.
   *
   * @param b Other collection to intersect with.
   * @param combine Function to combine values from the two collections.
   */
  def intersectWith[B, R, That](b: GenTraversable[(K, B)])(
      combine: (A, B) => R)(
      implicit bf: CanBuildFrom[Repr, (K, R), That]): That =
    a.intersectWithKey(b){ case (_, x, y) => combine(x, y) }(bf)
}

Now, I currently also have non-CanBuildFrom versions of intersectBy and friends, and these I can't get working as CanBuildFrom versions.

implicit class IntersectUnionByPimp[A](a: Traversable[A]) {
  /**
   * Intersect two collections by their keys, with separate key-selection
   * functions for the two collections. This is identical to
   * `intersectBy` except that each collection has its own key-selection
   * function. This allows the types of the two collections to be
   * distinct, with no common parent.
   *
   * @param b Other collection to intersect with.
   * @param key1fn Function to select the comparison key for the first
   *   collection.
   * @param key1fn Function to select the comparison key for the first
   *   collection.
   * @param combine Function to combine values from the two collections.
   */
  def intersectBy2[K, B, R](b: Traversable[B])(key1fn: A => K
      )(key2fn: B => K)(combine: (A, B) => R): Traversable[R] = {
    val keyed_a = a.map { x => (key1fn(x), x) }
    val keyed_b = b.map { x => (key2fn(x), x) }
    keyed_a.intersectWith(keyed_b)(combine).map(_._2)
  }

  /**
   * Intersect two collections by their keys. Keep the ordering of
   * objects in the first collection. Use a combiner function to
   * combine items in common. If either item is a multi-map, then
   * for a key seen `n` times in the first collection and `m`
   * times in the second collection, it will occur `n * m` times
   * in the resulting collection, including all the possible
   * combinations of pairs of identical keys in the two collections.
   *
   * @param b Other collection to intersect with.
   * @param keyfn Function to select the comparison key.
   * @param combine Function to combine values from the two collections.
   */
  def intersectBy[K, B >: A](b: Traversable[B])(keyfn: B => K)(
      combine: (A, B) => B): Traversable[B] = {
    val keyed_a = a.map { x => (keyfn(x), x) }
    val keyed_b = b.map { x => (keyfn(x), x) }
    keyed_a.intersectWith(keyed_b)(combine).map(_._2)
  }
}

The best version so far I can come up with is this:

implicit class IntersectUnionByPimp[A, Repr](a: GenTraversableLike[A, Repr]) {
  def intersectBy2[K, B, R, That](b: Traversable[B])(key1fn: A => K)(
      key2fn: B => K)(combine: (A, B) => R)(
      implicit bf: CanBuildFrom[Repr, R, That]): That = {
    // FIXME! How to make this work while calling `map`?
    // val keyed_a = a.map { x => (key1fn(x), x) }
    val keyed_a = mutable.Buffer[(K, A)]()
    a.foreach { x => keyed_a += ((key1fn(x), x)) }
    val keyed_b = b.map { x => (key2fn(x), x) }
    keyed_a.intersectWith(keyed_b)(combine).map(_._2)
  }

  def intersectBy[K, B >: A, That](b: Traversable[B])(keyfn: B => K)(
      combine: (A, B) => B)(
      implicit bf: CanBuildFrom[Repr, B, That]): That = {
    // FIXME! How to make this work while calling `map`?
    // val keyed_a = a.map { x => (keyfn(x), x) }
    val keyed_a = mutable.Buffer[(K, A)]()
    a.foreach { x => keyed_a += ((keyfn(x), x)) }
    val keyed_b = b.map { x => (keyfn(x), x) }
    keyed_a.intersectWith(keyed_b)(combine).map(_._2)
}

First, I don't see why I need to rewrite the call to map that produces keyed_a with a mutable Buffer; seems like there must be a better way. But I still get the same sort of error on the bottom line:

[error] /Users/benwing/devel/textgrounder/src/main/scala/opennlp/textgrounder/util/collection.scala:1018: type mismatch;
[error]  found   : scala.collection.mutable.Buffer[R]
[error]  required: That
[error]  Note: implicit method bufferToIndexedSeq is not applicable here because it comes after the application point and it lacks an explicit result type
[error]       keyed_a.intersectWith(keyed_b)(combine).map(_._2)
[error]                                                  ^

So my questions are:

  1. How to call map on a GenTraversableLike?
  2. How to make the call to intersectWith work correctly? I know I have to somehow pass in a CanBuildFrom based on the one I received, and I know about mapResult on Builders, but I'm not sure what to do here, or if this is even possible.

An example of intersectBy, which intersects lists of floating-point numbers treating two numbers the same if their integral part is the same, and computing the absolute difference:

scala> List(4.5,2.3,4.2).intersectBy(List(4.6,4.8))(_.toInt){ case (a,b) => (a - b).abs }
res2: Traversable[Double] = List(0.09999999999999964, 0.2999999999999998, 0.39999999999999947, 0.5999999999999996)

(except that the returned type should be List[Double])

Thanks for any help.

1
(1) You should really split this into two questions, because the second (intersectBy2) relies on the correct behaviour of the first (intersectWith). (2) It would make things easier to follow, if you gave an input and a desired output collection for intersectWith, e.g. if input is List(1 -> "foo", 2 -> "bar", 2 -> "baz"), what would be the second collection and the output.0__
@0__: I'm not quite sure why you want it split. intersectWith already works. I'm asking how to get intersectBy and intersectBy2 working, and the answer to intersectBy2 implies the answer to intersectBy because the former is more complex.Urban Vagabond
The implementation of intersectWithKey is missing, therefore this step should be first, perhaps it involves changing the precise signature.0__
@0__: intersectWithKey already works, with this precise signature. If you need to, insert a ??? as the implementation. The issue is how to get it to compile; if you insert ???, and it compiles correctly and then throws a "not-implemented" error upon running, then you've solved the problem. I'll insert the implementation if you want but it just takes up even more room.Urban Vagabond

1 Answers

1
votes

OK, turns out I need to create a builder to return the items, instead of trying to return them directly. The following works:

implicit class IntersectUnionByPimp[A, Repr](a: GenTraversableLike[A, Repr]) {
  /**
   * Intersect two collections by their keys, with separate key-selection
   * functions for the two collections. This is identical to
   * `intersectBy` except that each collection has its own key-selection
   * function. This allows the types of the two collections to be
   * distinct, with no common parent.
   *
   * @param b Other collection to intersect with.
   * @param key1fn Function to select the comparison key for the first
   *   collection.
   * @param key2fn Function to select the comparison key for the first
   *   collection.
   * @param combine Function to combine values from the two collections.
   */
  def intersectBy2[K, B, R, That](b: GenTraversable[B])(key1fn: A => K)(
      key2fn: B => K)(combine: (A, B) => R)(
      implicit bf: CanBuildFrom[Repr, R, That]): That = {
    // It appears we can't call map() on `a`.
    val keyed_a = mutable.Buffer[(K, A)]()
    a.foreach { x => keyed_a += ((key1fn(x), x)) }
    val keyed_b = b.map { x => (key2fn(x), x) }
    // Nor can we return the value of map() here. Need to use a builder
    // instead.
    val bu = bf()
    for ((_, r) <- keyed_a.intersectWith(keyed_b)(combine))
      bu += r
    bu.result
  }

  /**
   * Intersect two collections by their keys. Keep the ordering of
   * objects in the first collection. Use a combiner function to
   * combine items in common. If either item is a multi-map, then
   * for a key seen `n` times in the first collection and `m`
   * times in the second collection, it will occur `n * m` times
   * in the resulting collection, including all the possible
   * combinations of pairs of identical keys in the two collections.
   *
   * @param b Other collection to intersect with.
   * @param keyfn Function to select the comparison key.
   * @param combine Function to combine values from the two collections.
   */
  def intersectBy[K, B >: A, That](b: GenTraversable[B])(keyfn: B => K)(
      combine: (A, B) => B)(
      implicit bf: CanBuildFrom[Repr, B, That]): That = {
    val keyed_a = mutable.Buffer[(K, A)]()
    a.foreach { x => keyed_a += ((keyfn(x), x)) }
    val keyed_b = b.map { x => (keyfn(x), x) }
    val bu = bf()
    for ((_, r) <- keyed_a.intersectWith(keyed_b)(combine))
      bu += r
    bu.result
  }
}

I'm not completely sure why just calling map on a GenTraversableLike doesn't seem to work, but so be it.