1
votes

I have a following algorithm with scala:

  1. Do initial call to db to initialize cursor
  2. Get 1000 entities from db (Returns Future)
  3. For every entity process one additional request to database and get modified entity (returns future)
  4. Transform original entity
  5. Put transformed entity to Future call back from #3
  6. Wait for all Futures

In scala it some thing like:

 val client = ...
 val size = 1000
 val init:Future = client.firstSearch(size) //request over network
 val initResult = Await(init, 30.seconds)
 var cursorId:String = initResult.getCursorId
 while (!cursorId.isEmpty) {
    val futures:Seq[Future] = client.grabWithSize(cursorId).map{response=>
        response.getAllResults.map(result=>
            val grabbedOne:Future[Entity] = client.grabOneEntity(result.id) //request over network
            val resultMap:Map[String,Any] = buildMap(result)
            val transformed:Map[String,Any] = transform(resultMap) //no future here
            grabbedOne.map{grabbedOne=>
                buildMap(grabbedOne) == transformed
            }
        }
    Futures.sequence(futures).map(_=> response.getNewCursorId)
    }
 }

 def buildMap(...):Map[String,Any] //sync call

I noticed that if I increase size say two times, every iteration in while started working slowly ~1.5. But I do not see that my PC processor loaded more. It loaded near zero, but time increases in ~1.5. Why? I have setuped:

implicit val ec = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(1024))

I think, that not all Futures executed in parallel. But why? And ho to fix?

1
Are you sure DB is not your bottleneck?om-nom-nom
Sure. I did profile and request to db take ~5-10%. Note, that 5-10% is time relative to while algorithm not absolute process loading.Cherry
Check for the number of connections in connection pool and maximum allowed connections in DB side.Johny T Koshy
Can not check it right now, but have you spotted response.getAllResults.map? It does not traverse results. It just "put" map function, but actual (!) translating comes into Futures.sequence, because it iterates over Seq[Future] which call map function unders response.getAllResults. So each Future starts one by one, not all at same time. Can it be a source of problem?Cherry
what db driver are you using ? If possible try the new Slick 3.0 driver.Soumya Simanta

1 Answers

1
votes

I see that in your code, the Futures don't block each other. It's more likely the database that is the bottleneck.

Is it possible to do a SQL join for O(1) rather than O(n) in terms of database calls? (If you're using Slick, have a look under the queries section about joins.)

If the load is low, it's probably that the connection pool is maxed out, you'd need to increase it for the database and the network.