Python doesn't do anything to bind processes or threads to cores; it just leaves things up to the OS. When you spawn a bunch of independent processes (or threads, but that's harder to do in Python), the OS's scheduler will quickly and efficiently get them spread out across your cores without you, or Python, needing to do anything (barring really bad pathological cases).
The GIL isn't relevant here. I'll get to that later, but first let's explain what is relevant.
You don't have 8 cores. You have 4 cores, each of which is hyperthreaded.
Modern cores have a whole lot of "super-scalar" capacity. Often, the instructions queued up in a pipeline aren't independent enough to take full advantage of that capacity. What hyperthreading does is to allow the core to go fetch other instructions off a second pipeline when this happens, which are virtually guaranteed to be independent. But it only allows that, not requires, because in some cases (which the CPU can usually decide better than you) the cost in cache locality would be worse than the gains in parallelism.
So, depending on the actual load you're running, with four hyperthreaded cores, you may get full 800% CPU usage, or you may only get 400%, or (pretty often) somewhere in between.
I'm assuming your system is configured to report 8 cores rather than 4 to userland, because that's the default, and that you're have at least 8 processes or a pool with default proc count and at least 8 tasks—obviously, if none of that is true, you can't possibly get 800% CPU usage…
I'm also assuming you aren't using explicit locks, other synchronization, Manager
objects, or anything else that will serialize your code. If you do, obviously you can't get full parallelism.
And I'm also assuming you aren't using (mutable) shared memory, like a multiprocessing.Array
that everyone writes to. This can cause cache and page conflicts that can be almost as bad as explicit locks.
So, what's the deal with the GIL? Well, if you were running multiple threads within a process, and they were all CPU-bound, and they were all spending most of that time running Python code (as opposed to, say, spending most of that time running numpy operations that release the GIL), only one thread would run at a time. You could see:
- 100% consistently on a single core, while the rest sit at 0%.
- 100% pingponging between two or more cores, while the rest sit at 0%.
- 100% pingponging between two or more cores, while the rest sit at 0%, but with some noticeable overlap where two cores at once are way over 0%. This last one might look like parallelism, but it isn't—that's just the switching overhead becoming visible.
But you're not running multiple threads, you're running separate processes, each of which has its own entirely independent GIL. And that's why you're seeing four cores at 100% rather than just one.