Applications well suited for Xeon-phi many-core architecture

Question

From this https://software.intel.com/en-us/videos/purpose-of-the-mic-architecture I understand that applications with complex or numerous random memory access are not well suited for Intel Xeon-phi. This is because the architecture uses 61 cores and 8 memory controllers. In case of L1 and L2 cache misses, it takes up to 100s cycles to fetch the line into memory and get it ready for use by the CPU. Such applications are called latency-bound.

Then, the tutorial mentions that many-core architecture (Xeon-phi coprocessor only) are well suited for highly parallel homogeneous code. Two questions from there:

What is referred to as homogeneous code ?
What are real-world applications which can fully benefit from MIC architecture ?

Unknown Unknown · Accepted Answer · 2015-07-13T10:52:21

I see the Intel MIC architecture as a "x86 based GPGPU" and if you are familiar with the concept of GPGPU you will find your self familiar with the Intel MIC.

An homogeneous clustering is a system infrastructure with multiple execution unit (i.e. CPUs) all with the same features. For example a multicore system that have four Intel Xeon processors is homogeneous.

An heterogeneous clustering is a system infrastructure with multiple execution unit with different features (i.e. like CPU and GPU). For example my Levono z510 with its Intel i7 Haswell (4 CPUs), its Nvidia GT740M (GPU) and its Intel HD Graphics 4600 (GPU) is an heterogeneous system.

An example of heterogeneous code could be a Video Game.
A video game has a control code, executed by one code of one CPU, that control what the other agents do, its send shaders to execute on the GPUs, physic computation to be performed on others cores or GPUs and so on.
In this example you need to write code that run on the CPU (so it is "CPU aware") and code that run on GPU (so it is "GPU aware"). This is actually done by using different tools, different programming languages and different programming models!

homogeneous code is code that don't need to be aware of n different programming models, one for each different kind of agent. It is just the same programming model, language and tool.
Take a look a this very simple sample code for the MPI library.
The code is all written in C, it is the same program that just take a different flow.

About the applications, Well that's really a broad question...

As said above I see the Intel MIC as a GPGPU based on x86 ISA (part of it at least).

An SDK particularly useful (and listed on the video you linked) to work with clustered systems is OpenCL, it can be used for fast processing of images and computer vision and basically for anything that need the same algorithm to be run billions of times with different inputs (like cryptography applications/brute forcing).
If you search for some OpenCL based project on the web you will get an idea.

To answer you second question it is better to ask ourselves "What could not take advantage of the MIC architecture?" and we will soon find that the more an algorithm is distant from the concept of Stream Processing and the related topics, including the one of Kernel, the less it is suitable for the MIC.

Applications well suited for Xeon-phi many-core architecture

2 Answers