5
votes

The Peak GFLOPS of the the cores for the Desktop i7-4770k @ 4GHz is 4GHz * 8 (AVX) * (4 FMA) * 4 cores = 512 GFLOPS. But the latest Intel IGP (Iris Pro 5100/5200) has a peak of over 800 GFLOPS. Some algorithms will therefore run even faster on the IGP. Combining the cores with the IGP together would even be better. Additionally, the IGP keeps eating up more silicon. The Iris Pro 5100 takes up over 30% of the silicon now. It seems clear which direction Intel desktop processors are headed.

As far as I have seen the Intel IGP, however, is mostly ignored by programmers with the exception of OpenCL/OpenGL. I'm curious to know how one can program the Intel HD Graphics hardware for compute (e.g. SGEMM) without OpenCL?

Added comment: Their is no Intel support for HD graphics and OpenCL on Linux. I found beignet which is open source attempt to add support to Linux at least for Ivy Bridge HD graphics. I have not tried it. Probably the people developing Beignet know how to program the HD graphics hardware without OpenCL then.

3
Note: it's GFLOPS, not GFLOPs/s. Also why are you multiplying 8 (AVX) * (4 FMA) ?Paul R
I changed to GLOPS. FMA does a multiplication and addition simultaneously, that gives one factor of 2, Haswell can to two FMA instructions simultaneous that gives another factor of two. Each FMA can do one AVX instruction that gives another factor of 8 (single floating point).Z boson
GLSL programming? DirectCompute? PTX?huseyin tugrul buyukisik

3 Answers

4
votes

Keep in mind that there is a performance hit to copy the data to the video card and back, so this must be taken into account. AMD is close to releasing APU chips that have unified memory for the CPU and GPU on the same die, which will go a long way towards alleviating this problem.

The way the GPU used to be utilized before CUDA and OpenCL were to represent the memory to be operated on as a texture utilizing DirectX or OpenGL. Thank goodness we don't have to do that anymore!

AMD is really pushing the APU / OpenCL model, so more programs should take advantage of the GPU via OpenCL - if the performance trade off is there. Currently, GPU computing is a bit of a niche market relegated to high performance computing or number crunching that just isn't needed for web browsing and word processing.

4
votes

It doesn't make sense any more for vendors to let you program using low-level ISA.

  1. It's very hard and most programmers won't use it.
  2. It keeps them from adjusting the ISA in future revisions.

So programmers use a language (like C99 in OpenCL) and the runtime does ISA-specific optimizations right on the user's machine.

An example of what this enables: AMD switched from VLIW vector machines to scalar machines and existing kernels still ran (most ran faster). You couldn't do this if you wrote ISA directly.

1
votes

Programming a coprocessor like iris without opencl is rather like driving a car without the steering wheel.

OpenCL is designed to expose the requisite parallelism that iris needs to achieve its theoretical performance. You cant just spawn 100s of threads or processes on it and expect performance. Having blocks of threads doing the same thing, at the same time, on similar memory addresses, is the whole crux of the matter.

Maybe you can think of a better paradigm than opencl for achieving that goal; but until you do, I suggest you try learning some opencl. If you are into python; pyopencl is a great place to start.