mov & jmp to & jmp back vs call & ret

Question

I was going over some Assembly code and I saw this:

    mov r12, _read_loopr
    jmp _bzero
_read_loopr:
...
_bzero:
    inc r8
    mov byte [r8+r15], 0x0
    cmp r8, 0xff
    jle _bzero
    jmp r12

And I was wondering if there was any particular advantage to doing this (mov _read_loopr to a register jmp to the function and then jmp back) rather than the usual call _bzero and ret?

You can do it while you don't have a valid stack (or stackpointer). Otherwise, this kind of obfuscation will be practically undebuggable, so that's an obvious plus. — EOF
The ret would only return to the place where the call originated but the jmp r12 version can "return" to any spot the programmer wants. — Fifoernik
The jmp r12 is never used to jump to anywhere except a label immediately after the initial jmp to the subroutine. And this is in open source software so I don't think it's intended to obfuscate anything. — 0x777C
Sounds like an attempted micro-optimization. Somebody is trying to avoid the cost of accessing memory (to push/pop the return address), but the jmp r12 cannot be predicted so I suspect you're actually better off with the call/ret and taking advantage of the return address predictor. — Raymond Chen
Are there multiple "callers" for this byte-at-a-time _bzero that uses a custom ABI? I added the return-address label to your code, like you described in your comment, because that's critical information. — Peter Cordes

Peter Cordes Peter Cordes · Accepted Answer · 2016-07-24T08:51:41

This just looks like braindead code, especially if the return-address label is always right after the jmp _bzero like you say in your comment.

Maybe the author thought that they couldn't use call "because function calls clobber registers". This what you have to assume based on the calling convention if you're calling a function that isn't part of the same codebase. But you can call/ret to functions with custom calling conventions.

Of course, for code this small, it should have been inlined (i.e. make it a macro, not a function).

More importantly, something more clever than storing one byte at a time is normally possible, and probably worth a potential branch mispredict if there are more than a few bytes to zero. If at least 8 (or better, 16) bytes of data always need to be zeroed, you can do it with wide stores. Make the final store write the the last byte of the buffer to be zeroed, potentially overlapping with the previous store. (This is much better than ending with branches to decide to do a final 4B store, 2B store, and 1B store.) See the x86 tag wiki for resources about writing efficient asm.

If the return address was somewhere other than right after the jmp _bzero, then the worst possible thing would probably be push _read_loopr / jmp _bzero, and ret in _bzero. That would break the return-address predictor stack, leading to a mispredict on the next ~15 rets up the call tree.

Best would be to inline the loop and put a direct jmp after it.

I'm not sure how passing an address for _bzero to jmp to would compare with a call/ret and then a jmp after the call.

call/ret are fairly cheap, but not single-uop instructions on Intel. A jmp _bzero / jmp _read_loopr would be better if there was only one caller.

mov & jmp to & jmp back vs call & ret

1 Answers