0
votes

Let's say I am running multiple python processes(not threads) on a multi core CPU (say 4). GIL is process level so GIL within a particular process won't affect other processes.

My question here is if the GIL within one process will take hold of only single core out of 4 cores or will it take hold of all 4 cores?

If one process locks all cores at once, then multiprocessing should not be any better than multi threading in python. If not how do the cores get allocated to various processes?

As an observation, in my system which is 8 cores (4*2 because of hyperthreading), when I run a single CPU bound process, the CPU usage of 4 out of 8 cores goes up.

Simplifying this:

4 python threads (in one process) running on a 4 core CPU will take more time than single thread doing same work (considering the work is fully CPU bound). Will 4 different process doing that amount of work reduce the time taken by a factor of near 4?

2

2 Answers

5
votes

Python doesn't do anything to bind processes or threads to cores; it just leaves things up to the OS. When you spawn a bunch of independent processes (or threads, but that's harder to do in Python), the OS's scheduler will quickly and efficiently get them spread out across your cores without you, or Python, needing to do anything (barring really bad pathological cases).


The GIL isn't relevant here. I'll get to that later, but first let's explain what is relevant.

You don't have 8 cores. You have 4 cores, each of which is hyperthreaded.

Modern cores have a whole lot of "super-scalar" capacity. Often, the instructions queued up in a pipeline aren't independent enough to take full advantage of that capacity. What hyperthreading does is to allow the core to go fetch other instructions off a second pipeline when this happens, which are virtually guaranteed to be independent. But it only allows that, not requires, because in some cases (which the CPU can usually decide better than you) the cost in cache locality would be worse than the gains in parallelism.

So, depending on the actual load you're running, with four hyperthreaded cores, you may get full 800% CPU usage, or you may only get 400%, or (pretty often) somewhere in between.

I'm assuming your system is configured to report 8 cores rather than 4 to userland, because that's the default, and that you're have at least 8 processes or a pool with default proc count and at least 8 tasks—obviously, if none of that is true, you can't possibly get 800% CPU usage…

I'm also assuming you aren't using explicit locks, other synchronization, Manager objects, or anything else that will serialize your code. If you do, obviously you can't get full parallelism.

And I'm also assuming you aren't using (mutable) shared memory, like a multiprocessing.Array that everyone writes to. This can cause cache and page conflicts that can be almost as bad as explicit locks.


So, what's the deal with the GIL? Well, if you were running multiple threads within a process, and they were all CPU-bound, and they were all spending most of that time running Python code (as opposed to, say, spending most of that time running numpy operations that release the GIL), only one thread would run at a time. You could see:

  • 100% consistently on a single core, while the rest sit at 0%.
  • 100% pingponging between two or more cores, while the rest sit at 0%.
  • 100% pingponging between two or more cores, while the rest sit at 0%, but with some noticeable overlap where two cores at once are way over 0%. This last one might look like parallelism, but it isn't—that's just the switching overhead becoming visible.

But you're not running multiple threads, you're running separate processes, each of which has its own entirely independent GIL. And that's why you're seeing four cores at 100% rather than just one.

0
votes

Process to CPU/CPU core allocation is handled by the Operating System.