I'm writing a JIT compiler with an x86 backend and learning x86 assembler and machine code as I go. I used ARM assembler about 20 years ago and am surprised by the difference in cost models between these architectures.
Specifically, memory accesses and branches are expensive on ARM but the equivalent stack operations and jumps are cheap on x86. I believe modern x86 CPUs do far more dynamic optimizations than ARM cores do and I find it difficult to anticipate their effects.
What is a good cost model to bear in mind when writing x86 assembler? Which combinations of instructions are cheap and which are expensive?
For example, my compiler would be simpler if it always generated the long form for loading integers or jumping to offsets even if the integers were small or the offsets close but would this impact performance?
I haven't done any floating point yet but I'd like to get on to it soon. Is there anything not obvious about the interaction between normal and float code?
I know there are lots of references (e.g. Michael Abrash) on x86 optimization but I have a hunch than anything more than a few years old will not apply to modern x86 CPUs because they have changed so much lately. Am I correct?