Multitasking latency and throughput

Question

I have been doing some experimentation around multi-threading and multi-tasking and I have a few questions.

1) Consider a machine with just one CPU core/processor. Running multiple threads parallely is achieved by time slicing (each thread is given a few milliseconds to run on the CPU) and if one thread is blocked on I/O, the others can run, thus giving an impression of doing multiple things concurrently.

Consider the same single processor CPU and say there are two threads, both of which is CPU intensive and does no I/O. Each takes 5 seconds to complete its operation. Running it sequentially will take obviously 10 secs.

Will running it parallely (using executor service) also take 10 seconds since the threads are CPU intensive only? (given that there is only one CPU core)

2) On a machine with four cores, I have a task that will take 5 seconds to complete when run independently. I wanted to run a couple of them in parallel as separate processes using Java's ProcessBuilder (I don't want to run as parallel threads as the object that they share is not thread safe and cost of synchronization is too much). I expected it to complete around 5 seconds.

Starting two parallel process did not yield much improvement as they still took around 9 seconds to complete. Why is that? (the task is both CPU intensive and consumes significant memory). Note: As I start three or four processes, the latency is more and more affected (worse or equivalent to running them in sequence).

EDIT

My intention/ question is more around why the performance is not improved when using processes rather than whether I should use multiple threads/process for my second question. For question 2, (for the sake of discussion) you can simple imagine the task like this: (Although the numbers mentioned in the question above is not from the below snippet)

String s;
for (int i = 0; i < 50000; i++) {
 s += i; //consumes a lot of memory
}

long result = 0;
for(int j = 0; j < 3; j++) {
  for (double i = 0; i < 500000000; i++) { //Lot of CPU
    result += i;
   }
}

tell us more about "the object that they share" and precisely how they share it. — Mike Nakis
@MikeNakis It is a static object holding a bunch of state. So I want to create separate processes and let each work in their own context/space. — user7
@MikeNakis Please see the updated question with code snippets. — user7
It will be very helpful to see the full code including how you were calculating the execution time. — tsolakp

Yati Sawhney Yati Sawhney · Accepted Answer · 2017-06-04T03:44:34

Answer to your 1st question:

The process/thread burst diagram is shown below

I/O ==> CPU ==> I/O ===> CPU ==>...... I/O..... ==> CPU == > I/o

It's pretty obvious that they will start and end up with an I/o burts And the multiprocessing/threading on a single processor is achieved only in time multiplexed manner, like you said.

Coming to your question: Here both of your jobs/threads are CPu bound jobs and doesn't contain any any internal I/O(refer to flow referred above) operation. Therefore, It might take the same time as that of a sequential setup. In case you have a lot of threads running the same way , which donot have any internal I/o bound operations may take more time than the sequential access because then a lot of time is spent in context switching.

because of such scenarios LongTerm Schedulers are designed and programmed to put a proper mix of I/o bound and Cpu bound jobs in the ready queue.

Multitasking latency and throughput

3 Answers