2
votes

The following two code snippets differ only the value loaded into the x23 register, but the minstret instruction counts (reported by a Verilator simulation of the Rocket chip) differ substantially. Is this a bug, or am I doing something wrong?

The read_csr() function is from the RISC-V Frontend Server Library (https://github.com/riscv/riscv-fesvr/blob/master/fesvr/encoding.h), and the rest of the code [syscalls.c, crt.S, test.ld] is similar to the RISC-V benchmarks (https://github.com/riscv/riscv-tests/tree/master/benchmarks/common).

I have checked that the compiled binaries contain the exact same instructions, except for the difference in the operands.

Dividing 0x0fffffff by 0xff, repeating 1024 times: 3260 instructions.

size_t instrs = 0 - read_csr(minstret);

asm volatile (
        "mv             x20,    zero;"
        "li             x21,    1024;"
        "li             x22,    0xfffffff;"
        "li             x23,    0xff;"

    "loop:"
        "div            x24,  x22,  x23;"
        "addi           x20,  x20,  1;"
        "bleu           x20,  x21,  loop;"

    ::: "x20", "x21", "x22", "x23", "x24", "cc"
);

instrs += read_csr(minstret);

Dividing 0x0fffffff by 0xffff, repeating 1024 times: 3083 instructions.

size_t instrs = 0 - read_csr(minstret);

asm volatile (
        "mv             x20,    zero;"
        "li             x21,    1024;"
        "li             x22,    0xfffffff;"
        "li             x23,    0xffff;"

    "loop:"
        "div            x24,  x22,  x23;"
        "addi           x20,  x20,  1;"
        "bleu           x20,  x21,  loop;"

    ::: "x20", "x21", "x22", "x23", "x24", "cc"
);

instrs += read_csr(minstret);

Here, 3083 instructions seems correct (1024 * 3 = 3072). Since minstret counts retired instructions, it seems strange that first example executed ~200 more instructions. These results are always the same no matter how many times I run these two programs.

1
what does the full dissassembly look like including the code that reads the instruction counter? - old_timer
Disassembly for the 0xff case: termbin.com/p713 and for the 0xffff case: termbin.com/mrj9. The function addresses differ, so vimdiff isn't very helpful, but if you look for the loop symbol, you'll be able to locate the above snippets. - radiosonde
I forgot to mention that the code that reads the instruction counter is just csrr a5,minstret. - radiosonde
hmmm, I see what you are saying. I dont remember if they open sourced their logic, perhaps there is a shortcut in the divide based on values, but that wouldnt make sense either. Was hoping it was something simple like alignment or a slight build difference. But that does not appear to be the case. - old_timer
Have you approached them on this? - old_timer

1 Answers

1
votes

The problem was resolved at https://github.com/freechipsproject/rocket-chip/issues/1495.

Servicing the debug interrupt, which is apparently used by the simulation to know whether the benchmark has finished executing, caused the differences in the instruction count. The verbose log produced by Verilator shows the debug address range (0x800 onwards) being injected at different points during the execution.