4
votes

I am trying to understand more about parallelism, but I've noticed there are a lot of different terms out there and some seem to mean the same thing while others have a notable difference. So, what are all the different types of parallelism, how do they differ from each other, and do any have specific applications or purposes?
(To keep this more focused, I'm hoping for an answer that provides clarity to all the terminology associated with parallelism, including terms not listed below; technical comparisons between each different type would be nice, but will probably result in this question becoming off-topic - then again, I don't really know, hence the question).

Note:
this is not a question about concurrency and goes beyond the "simple" question: "what is parallelism?", although a clarifying definition might be warranted.

First, I have taken notice of the difference between parallelism and threading, but some of the differences between the following terms are still confusing.

To add clarity to my question here is a list of terms that I have found that are related to parallelism: parallel computing, parallel processing, multithreading, multiprocessing, multicore programming, Hyper-threading (Intel) 2, Simultaneous MultiThreading (SMT) 3, Switch-on-Event MultiThreading 3. (If possible, definitions or references to definitions for each of these terms would also be appreciated).

My very specific question: what is the difference between thread-level parallelism, instruction-level parallelism, and process-level parallelism? (and any other x-level parallelism)?

In a multi-core processor, can parallelism occur within a single core? Is that what Hyper-threading is, and does that require a single core having, for example, two ALU's that can be used in parallel?

Last one: is there a difference between hardware vs software parallelism, aside from the obvious distinction that one happens in hardware while the other in software?

Related resources:
- Process vs Thread,
- Parallelism on a GPU,
- Hyper-threading,
- Concurrency vs Parallelism,
- Hyper-threading and gaming.

2
In general terms, they all mean the same things, but specific authors have different meanings for different terms, and different authors have different meanings for the same terms. All of this means that you need to pay attention to context.Chris Dodd
You missed SIMD.Mark Setchell

2 Answers

2
votes

Q:
What is the difference between
thread-level parallelism,
instruction-level parallelism,
and process-level parallelism?

While the subject matter is indeed immensely wide, I would try to have this view, even at a risk of making many opponents present their objections of simplifying the subject matter ( but StackOverflow format does not substitute other sources of complete reference, does it ? ):


A:
the main difference is WHAT / WHO / HOW
is responsible for keeping things to execute in true-[PARALLEL]

  • Instruction Level Parallelism - ILP - is the simplest case, the CPU-architecture has designed and "hardwired" this particular form of hardware-based parallelism. Having processors with ILP4 ( 4 instructions executed at once ), or having processors with per-instruction based width of this form of parallel-instruction execution, be it ILP2 for some instructions but ILP1 for some others, again the silicon architecture decides, what can happen indeed in parallel at the instruction level. Some awkward surprises may arise from further details, as memory-controller channels may block ILP-mode in cases, where REG/MEMORY uops will have to wait for a free channel to access the instructed MEMORY.

  • hardware-threads are the next level of granularity. Given a CPU-core is declared to support two hardware threads, these are the only streams-of-code execution, that may flow in parallel ( if no O/S request comes to instantiate and schedule another thread to get executed, mapped onto one of the available CPU-core hardware-threads ). From the user-perspective, there are O/S tools that permit one to explicitly "nail"-down a process-level-PID / thread-level-PID affinity onto a particular CPU-core(s) and thus limit or even eliminate any "disturbance", so as to move from a "just"-[CONCURRENT] flow of code-execution closer to a true-[PARALLEL]one.

We will knowingly skip all the crowds of threads, that are just a tool for latency-masking ( be it on the SIMT / SMX warp-wide GPU-scheduler, or the more relaxed, MIMT O/S-kernel driven multithreading )


- MIMT: Multiple Instruction Multiple Threads, a non-restricted thread-execution fabric / policy, where any thread may and does issue a different instruction to the processor for execution, as opposed to SIMT
- SIMT: Single Instruction Multiple Threads, typically a GPU Streaming Multiprocessor code-execution architecture
- SMX: Streaming Multiprocessor eXecution unit, typically a GPU SIMT building block, onto which the GPU-kernel code-units could be directed ( addressed ) for being TaskQueeue-scheduled and later executed, according to the WARP-wide SIMT-code scheduler coordinated

1
votes

what is the difference between thread-level parallelism, instruction-level parallelism, and process-level parallelism?

In 1, different CPU cores execute different streams of instructions.

In 2, single CPU core executes different instructions from a single instruction stream in parallel (these instructions are either consecutive instructions in the stream, or otherwise very close to each other).

3 is same as 1, the difference is cosmetic. It’s just the default settings about which memory pages are shared across threads and which aren’t. But these settings are user-adjustable with process creation flags, shared memory sections, dynamic libraries, and other system APIs, that’s why on the lower level, the difference between process and threads is not a big deal.

and any other x-level parallelism

Another important one is SIMD level parallelism. For this one, CPU applies same instruction to multiple operands stored in special wide registers. With SSE we have 128-bit wide registers, and we can e.g. multiply a vector of 4 single-precision floating-point numbers in one register by another 4 values in another register, making 4 products in parallel, with a single mulps instruction. ARM NEON is similar, also 128 bit registers, the instruction to multiply 4 floats by 4 floats is vmul.f32. AVX operates on 256-bit registers so it can multiply 8 floats at once, with a single vmulps instruction.

can parallelism occur within a single core?

Yes.

Is that what Hyper-threading is

Yes, also it’s what instruction-level parallelism is, and SIMD parallelism, too.

does that require a single core having, for example, two ALU's that can be used in parallel?

Modern CPUs have more than two per core but HT was introduced in P4 and it’s not a requirement. The profit from HT is not just loading multiple ALUs, it’s also using the core while a thread is waiting for data to arrive from caches or from system RAM. And also, using the core while it's stalled because of the data dependency between nearby instructions. HT allows a CPU core to compute something else on another hardware thread while it’s waiting, therefore improving ALU utilization. Without HT, the core would likely just sit and wait for hundreds cycles in case of RAM latency, or for dozens cycles in case of data dependency latency.

is there a difference between hardware vs software parallelism

When you have a single hardware thread and multiple OS threads that compute stuff, only 1 thread will be running at any given time. The rest of the threads will be waiting. The OS will periodically (often ~50-100Hz) switch which one’s running, with the goal to give all threads a fair slice of CPU time. You can call that software parallelism if you want, but I wouldn’t call such thing parallel at all.