4
votes

I am writing my first NES emulator in C. The goal is to make it easily understandable and cycle accurate (does not necessarily have to be code-efficient though), in order to play games at normal 'hardware' speed. When digging into the technical references of the 6502, it seems like the instructions consume more than one CPU cycle - and also has different cycles depending on given conditions (such as branching). My plan is to create read and write functions, and also group opcodes by addressing modes using a switch.

The question is: When I have a multiple-cycle instruction, such as a BRK, do I need to emulate what is exactly happening in each cycle:

#Method 1

cycle - action

1 - read BRK opcode
2 - read padding byte (ignored)
3 - store high byte of PC
4 - store low byte of PC
5 - store status flags with B flag set
6 - low byte of target address
7 - high byte of target address

...or can I just execute all the required operations in one 'cycle' (one switch case) and do nothing in the remaining cycles?

#Method 2

1 - read BRK opcode,
read padding byte (ignored),
store high byte of PC,
store low byte of PC,
store status flags with B flag set,
low byte of target address,
high byte of target address
2 - do nothing
3 - do nothing
4 - do nothing
5 - do nothing
6 - do nothing
7 - do nothing

Since both methods consume the desired 7 cycles, will there be no difference between the two? (accuracy-wise)

Personally I think method 1 is the way-to-go solution, however I cannot think of a proper, easy way to implement it... (Please help!)

1
It won't make a difference when you only emulate the CPU. But think about peripherals accessing the same memory as an STA instruction for example -- it might be important when exactly the memory cell changes. So, go for option 1.user2371524
As for how, you will need a state machineuser2371524
Do you know any good C examples or sources that does implement method 1? All I can find are non-C emulators and some are too complicated to understand at my current level :(H.J Jang
if you do a memory access, then it depends on where your data is. It could be cached or not or cached on a different cpu or be allocated in a different bank. You cannot really predict how many cycles you need for such an instruction. You need to emulate caching, tlb, ddr to do so.Serge
@Serge this is about 6502. You can predict all of this, there's no cache.user2371524

1 Answers

9
votes

Do you 'need' to? It depends on the software. Imagine the simplest example:

STA ($56), Y

... which happens to hit a hardware register. If you don't do at least the write on the correct cycle then you've introduced a timing deficiency. The register you're writing to will be written to at the wrong time. What if it's something like a palette register, and the programmer is running a raster effect? Then you've just moved where the colour changes. You've changed the graphical output.

In practice, clever programmers do much smarter things than that — e.g. one might use a read-modify-write operation to read a hardware value at an exact cycle, modify it, then write it back at some other exact cycle.

So my answer is:

  1. most software isn't written so that the difference between (1) and (2) will have any effect; but
  2. some definitely is, because the author was very clever; and
  3. some definitely is, just because the author experimented until they found a cool effect, regardless of whether they were cognisant of the cause; and
  4. in any case, when you find something that doesn't work properly on your emulator, how much time do you want to spend considering all the permutations and combinations of potential causes? Every one you can factor out is one less to consider.

Most emulators used to use your method (2). What normally happens is that they work with 90% of software. Then there's a few cases that don't work, for which the emulator author puts in a special case here, a special case there. Those usually ended up interacting poorly and the emulator spent the rest of its life oscillating between supporting different 95% combinations of available software until somebody wrote a better one.

So just go with method (1). It will cause some software that would otherwise be broken not to be so. Also it'll teach you more, and it'll definitely eliminate any potential motivation for special cases so it'll keep your code cleaner. It'll be marginally slower but I think your computer can probably handle it.

Other tips: the 6502 has only a few addressing modes, and the addressing mode entirely dictates the timing. This document is everything you need to know for perfect timing. If you want perfect cleanliness, your switch table can just pick an addressing mode and a central operation, then exit and you can branch on addressing mode to do the main action.

If you're going to use vanilla read and write methods, which is smart on a 6502 as every single cycle is either a read or a write so it's almost all you need to say, just be careful of the method signatures. For example, the 6502 has a SYNC pin which allows an observer to discriminate an ordinary read from an opcode read. Check whether the NES exposes that to cartridges, as it's often used on systems that expose it for implicit paging and the main identifying characteristic of the NES is that there are hundreds of paging schemes.

EDIT: minor updates:

  • it's not actually completely true to say that a 6502 always reads or writes; it also has an RDY input. If the RDY input is asserted and the 6502 intends to read, it will instead halt while maintaining the intended read address. Rarely used in practice because it's insufficient for common tasks like allowing somebody else to take possession of memory — the 6502 will write regardless of the RDY input, it's really meant to help with single-stepping — and seemingly not included on the NES cartridge pinout, you needn't implement it for that machine.
  • per the same pinout, the sync signal also doesn't seem to be exposed to cartridges on that system.