2
votes

I'm struggling with the absence of Java's Iterator.remove() in Scala. In particular, I want to, in a single pass over a large mutable map, remove elements that satisfy a predicate and collect them in another mutable map.

Here's what I am trying to do:

def main(args: Array[String]) {
  val map = new TrieMap[String, Integer]();
  map += "one" -> 1
  map += "two" -> 2

  // Remove all elems whose value is > 1 and put them in val removed.
  val removed = removeIf(map, _._2 > 1) 
}

def removeIf(
    map: mutable.Map[String, Integer], 
    p: ((String, Integer)) =>  Boolean): mutable.Map[String, Integer] = {

  val result = mutable.Map[String, Integer]()
  val iter = map.iterator
  while (iter.hasNext) {
    val elem = iter.next()
    if ( p(elem) ) {
      iter.remove()  // Error
      result += elem
    }
  }
  result
}

For some sound reason, Scala's Iterator, even on a mutable collection, does not implement remove().

Edit Two solutions offered below are:

  1. Don't worry about the cost of the second pass and use filter() and then --= to remove the filtered entries:

    val result = map.filter(p)

    map --= result.keys

  2. Use partition and reassign the new map to the old variable:

    (result, newMap) = map.partition({case (k,v) => ... })

I ran some tests. As expected, first solution is actually faster, in cases when the number of removed entries is smaller compared to the size of the original map. The inflection point, where the two solutions run for roughly the same time is when the predicate splits the original map about in half. The second solution doesn't seem to depend on this, but the first one, obviously does. Both are O(n), so perhaps I am being too picky here. I wish I could split the checkmark between the two answers. Thanks to both, Don Branson and rogue-one.

3
Using mutable array is not really idiomatic Scala... Partition would be the idiomatic way to go, IMHO (see @rogue-one's answer)Cyrille Corpet

3 Answers

5
votes

the below works if you are fine with returning a new Map object. The solution uses partition method of collections and uses only a single pass.

scala> val map = TrieMap[String, Integer]("one" -> 1, "two" -> 2)
map: scala.collection.concurrent.TrieMap[String,Integer] = TrieMap(two -> 2, one -> 1)

scala> val (newMap, removed) = map.partition({case(_, x) => x > 1})
newMap: scala.collection.concurrent.TrieMap[String,Integer] = TrieMap(two -> 2)
removed: scala.collection.concurrent.TrieMap[String,Integer] = TrieMap(one -> 1)
2
votes

An idiomatic way to approach this is to use filterNot() / filter():

def main(args: Array[String]) {
  val map = new TrieMap[String, Integer]();
  map += "one" -> 1
  map += "two" -> 2

  val removed = map.filterNot(_._2 > 1)
  val newMap = map.filter(_._2 > 1)
}

However, the two calls can be combined into one call to partition:

val (newMap, removed) = map.partition(_._2 > 1)

The bottom line is, updating a mutable collection is applying a procedural idiom onto a functional language, and opens the door to certain types of bugs. Returning new, immutable collections is more consistent with being functionally idiomatic.

Thanks go to rogue-one for calling out partition() as an option.

0
votes

Try to groupBy the predicate, you'll have a map of two keys: true for the ones that should remain, and false for the ones who should be removed.

  val p: ((String, Int)) =>  Boolean = (_._2>1)
  private val booleanToStringToInt = Map[String, Int]("one" -> 1, "two" -> 2).groupBy(p)
  val remain =  booleanToStringToInt(true)
  val removed = booleanToStringToInt(false)