1
votes

I am trying to wrap my head around how futures work under the hood. I am familiar with the concept in both Java and Scala. I've been using futures in PlayFramework to prevent blocking operations from occupying my connection threads. That has worked well for the most part, but I am still feeling that there are some parts under the hood that I am missing. Particularly, when it comes to keeping the number of threads used for executing blocking operations low.

My generation assumption is (correct me if I am wrong) that there should be a single thread (in the simplest case), running an endless loop over a collection of pending futures. On every turn, the thread sleeps a little, then picks the futures one by one, checks if some results have arrived, and if so, returns them and removes the finished futures form the collection.

IMHO, this can only work if the underlying operations are non-blocking too. Otherwise, logic tells me that those should be isolated in their own separate threads, as part of a pool. My train of thought crashes at the point where each operation, even within a future is fundamentally blocking. Then, I would assume that in the worst case scenario, we would once again end up with one thread per blocking operation, even when wrapped in futures.

The problem is that the a large portion of the widely used IO code in Java is fundamentally blocking. This means that executing 15 JDBC operations wrapped in futures, will still spun off 15 threads. Otherwise, we would have to call them sequentially on a single thread, which is even worse.

What I am trying to say is that wrapping fundamentally blocking IO operations in futures, should in theory not help at all. Am I right or wrong? Please, help me build the puzzle.

3
It uses thread pools to limit the number of threads.user1804599
@rightføld Still, even that won't help with blocking IO. When threads in the pool are over, the pool would have to either generate new ones (expensive), or wait (time consuming). Am I right?Preslav Rachev
@user1107412 Yes, that's right. Each running Future has one thread from the thread pool assigned to it. If all threads are taken the pool will either have to create a new one or wait until one is available. That depends on how the thread pool is configuredJohn Vint
IMO the question title fooled me. This is not a question about concurrency (AKA parallelization because it involves Futures), yet then the question body itself is only focused on concurrently doing I/O actions specifically.Gimby

3 Answers

2
votes

"What I am trying to say is that wrapping fundamentally blocking IO operations in futures, should in theory not help at all. Am I right or wrong?"

It depends on what you mean by "help". It will not magically make your I/O operation non-blocking that's true (and a lot of people get confused by that).

What it will do, depending on how the thread pool is configured, is make sure your I/O (or whatever blocking operation) is running in a dedicated thread - leaving the "core" threads free to run.

Let's say your pool has a number of threads equivalent to the cores of your machine, if you want to perform a blocking operation you can hint that to the pool and it will create a specific thread for this operation since it knows that this thread won't use much of the CPU.

Another common practice, is to use a dedicated ExecutionContext for blocking operations, to isolate their execution from the rest of your program.

1
votes

While there are still some helpful aspects to using futures even if the "underlying" I/O is happening via a pool of blocking threads, you're right that in general this wouldn't usually provide some of the big advantages of Futures. The piece you're missing is the possibility of doing I/O that is truly nonblocking, i.e. ultimately calling select or similar interfaces at the system call level. The Java NIO interfaces can do this, as can a number of frameworks built on top of that (e.g. Netty), or callback-oriented libraries like Apache's HttpAsyncClient.

Unfortunately there is no async replacement for full JDBC that I'm aware of, though there is e.g. postgresql-async, which covers at least some of the use cases.

1
votes

"My generation assumption is (correct me if I am wrong) that there should be a single thread (in the simplest case), running an endless loop over a collection of pending futures". This isn't always true. It depends on the ExecutionContext that is being used. You can have an ExecutionContext that creates a new thread for each Future if you want or you can have a fixed pool of threads that run your Futures. There are other options as well. It really depends on what you are trying to do.

IMO, the real power of Futures comes in the programming model that it allows. Once you understand the model, you can compose Futures pretty easily. Then, if you need to alter the concurrency model a little, you can do that by configuring your ExecutionContexts and using multiple/different ExecutionContexts per group of tasks. In a real/large application multiple ExecutionContexts are used - not just the default one.

You mentioned blocking IO and JDBC, so I'll give an example related to that. In the past I've used Futures with JDBC to do concurrent inserts to different tables. If the inserts are unrelated (no FK for example), assuming you have an ExecutionContext configured with at least as many threads as JDBC connections, then the caller only waits for the longest duration insert not the sum of all insert times. For example:

// functions that insert into tables
// return Future primary key (Long)
// use blocking JDBC calls
def insertIntoTable1(col1: String, col2: String) : Long = ...

def persistData(... input data for all tables ...) = {
  // get Future primary keys
  val futurePrimaryKey1 = insertIntoTable1(...)
  val futurePrimaryKey2 = insertIntoTable2(...)
  val futurePrimaryKey3 = insertIntoTable3(...)

  // inserts for id1 ... idN are done concurrently
  val futureParentTablePrimaryKey : Future[Long] = for {
    id1 <- futurePrimaryKey1
    id2 <- futurePrimaryKey2
    id3 <- futurePrimaryKey3
  } yield insertIntoParentTable(id1, id2, id3)

  // wait for final insert
  Await.result(futureParentTablePrimaryKey, Duration.Inf)
  // of course if you wanted the caller to know about Futures, don't do the Await.
}

It's not obvious from the pseudocode, but a non-default ExecutionContext should be used. So, in the particular example blocking IO is all over the place. But the key is that it's concurrent waiting and the programming model is simple to understand.