I am trying to determine the time to access two memory addresses which a separated by a certain delta. My code has to be mixed x86 and C and will be run "bare metal" (without any OS; edit: I am actually modifying memtest) in order to get the most precise result.
I am more used to ARM assembly than x86, thus I might have done some mistakes (and am wondering, why mov does so many different things in x86). My code so far is as follows.
inline unsigned timeread (ulong addr, ulong delta, int iter)
{
ulong daddr;
int i;
ulong st_low, st_high;
ulong end_low, end_high;
daddr = addr + delta;
asm __volatile__ ("rdtsc":"=a" (st_low),"=d" (st_high));
for (i = 0; i < iter; ++i)
{
asm __volatile__ (
"movl (%0), %%eax\n\t"
:
: "D" (addr)
: "eax"
);
asm __volatile__ (
"movl (%0), %%eax\n\t"
:
: "D" (daddr)
: "eax"
);
}
asm __volatile__ ("rdtsc":"=a" (end_low),"=d" (end_high));
asm __volatile__ (
"subl %2,%0\n\t"
"sbbl %3,%1"
: "=a" (end_low), "=d" (end_high)
: "g" (st_low), "g" (st_high),
"0" (end_low), "1" (end_high)
);
return end_low;
}
I am using gcc and compilling with the flags -march=i486 -m32.
EDIT : Before calling the function, I call a function provided by memtest, set_cache(0), in order to deactivate the cache (at least, that is what it says). Calling set_cache(1) instead reduces the execution time extremely (I drop from ~2000 cycles to <10 cycles). If there is still some cache left, I suppose memtest didn't find a solution for this problem either.
EDIT : Asking the question usually helps getting an answer ...
Is the assembly code correct ? I am quite surprised that mov is able to access the RAM in x86, since in ARM you would use the specialised LDR for this.