I'm trying to use the Intel FMA intrinsics like _mm_fmadd_ps (__m128 a, __m128 b, __m128 c) in order to get better performance in my code.
So, first of all, i did a little test program to see what it can do and how can I possibly use them.
#include <stdio.h>
#include <stdlib.h>
#include "xmmintrin.h"
int main()
{
__m128 v1,v2,v3,vr;
v1 = _mm_set_ps (5.0, 5.0, 5.0, 5.0);
v2 = _mm_set_ps (2.0, 2.0, 2.0, 2.0);
v3 = _mm_set_ps (3.0, 3.0, 3.0, 3.0);
vr = _mm_fmadd_ps (v1, v2, v3);
}
and i've got this error :
vr = error: incompatible types when assigning to type ‘__m128’ from type ‘int’ vr = _mm_fmadd_ps (v1, v2, v3);
I thought it was probably the processor capabilities is not allowing the use of such instructions so I looked on the internet for my processor model (Intel® Core™ i7-4700MQ Processor) and I found out that it supports only SSE4.1/4.2, AVX 2.0 intrinsics which was a little bit weird for me!! So I looked in the proc/cpuinfo file and the flags section I found the ** fma ** flag. This is the confusing part about the hardware.
As for the software, i've used this makefile option after some digging on the internet and I hope it's not the issue.
CC=gcc
CFLAGS=-g -c -Wall -O2 -mavx2 -mfma
And I'm using eclipse on a Ubuntu 12.04 LTS with a GCC version 4.9.4 Thank you.
#include <immintrin.h>
. – Paul R-O2
), the compiler elides all this code and simply emits code to return 0 frommain
(demo). So it'll run real fast. :-) – Cody Grayvr
volatile
fixes this though, if you just want to see the generated code. – Paul R