I've been working on an Intel 8086 emulator for about a month now. I've decided to start counting cycles to make emulation more accurate and synchronize it correctly with the PIT.
The clock cycles used for each instruction are detailed in Intel's User Manual but I'd like to know how they're calculated. For example, I've deduced the following steps for the XCHG mem8,reg8 instruction - which takes exactly 17 clock cycles according to the manual:
- decode the second byte of the instruction: +1 cycle;
- transfer first operand from memory into a temporary location: +7 cycles;
- transfer second operand from register into memory destination: +8 cycles;
- transfer first operand from temporary location into register destination: +1 cycle.
But I'm probably completely wrong as my reasoning doesn't seem to work for all instructions. For instance, I can't comprehend why the PUSH reg instruction takes 11 clock cycles, whereas the POP reg instruction only takes 8 clock cycles.
So, could you tell me how clock cycles are spent in each instruction, or rather a general method to understand where those numbers come from?
Thank you.

PUSHis basically aMOVfrom register to memory.POPis aMOVfrom memory to register. From the tables, the former is 9+EA, the latter 8+EA. Since you can POP with 0 EA (stack pointer is already pointing to where you will POP from) this can start immediately and the stack pointer decrement can (I guess) overlap the read cycle once it is no longer needed. For the PUSH operation there is 2 EA since the stack pointer must be incremented before issuing the MOV. I would suppose this is where the extra cycles come from. This is only speculation. I don't know this for certain. - J...