1
votes

I'm currently trying to write a NES emulator in C++ as a summer programming project to get ready for fall term next school year (I haven't coded in a while). I've already written a Chip8 emulator, so I thought the next step would be to try and write a NES emulator.

Anyways, I'm getting stuck. I'm using this website for my opcode table and I'm running into a road block. On the Chip8, all opcodes were two bytes long, so they were easy to fetch. However, the NES seems to have either 2 or 3 byte opcodes depending on what addressing mode the CPU is in. I can't think of any easy way to figure out how many bytes I need to read for each opcode (my only idea was to create really long if statements that check the first byte of the opcode to see how many more bytes to read).

I'm also having trouble with figuring how to count cycles. How do I create a clock within a programming language so that everything is in sync?

On an unrelated side note, since the NES is little-endian, do I need to read programCounter + 1 and then read programCounter to get the correct opcode?

2
It sounds like a lookup-table would be appropriate here.Oliver Charlesworth

2 Answers

2
votes

However, the NES seems to have either 2 or 3 byte opcodes depending on what addressing mode the CPU is in. I can't think of any easy way to figure out how many bytes I need to read for each opcode.

The opcode is still only one byte. The extra bytes specify the operands for those instructions that have explicit operands. To do the decoding, you can create a switch-block with 256 cases (actually it won't be 256 cases, because some opcodes are illegal). It could look something like this:

opcode = ReadByte(PC++);
switch (opcode) {
...
case 0x4C:              // JMP abs
    address = ReadByte(PC++);
    address |= (uint16_t)ReadByte(PC) << 8;
    PC = address;
    cycles += 3;
    break;
...
}

The compiler will typically create a jump table for the cases, so you'll end up with fairly efficient (albeit slightly bloated) code.

Another alternative is to create an array with one entry per opcode. This could simply be an array of function pointers, with one function per opcode - or the table could contain a pointer to one function for fetching the operands, one for performing the actual operation, plus information about the number of cycles that the instruction requires. This way you can share a lot of code. An example:

const Instruction INSTRUCTIONS[] =
{
    ...
    // 0x4C: JMP abs
    {&jmp, &abs_operand, 3},
    ...
};

I'm also having trouble with figuring how to count cycles. How do I create a clock within a programming language so that everything is in sync?

Counting CPU cycles is just a matter of incrementing a counter, like I showed in my code examples above.

To sync video with the CPU, the easiest way would be to run the CPU for the amount of cycles corresponding to the active display period of a single scanline, then draw one scanline, then run the CPU for the amount of cycles correspond to the horizontal blanking period, and start over again.

When you start involving audio, how you sync things can depend a bit on the audio API you're using. For example, some APIs might send you a callback to which you respond by filling a buffer with samples and returning the number of samples generated. In this case you could calculate the number of CPU cycles that have been emulated since the previous callback and determine how many samples to generate based on that.


On an unrelated side note, since the NES is little-endian, do I need to read programCounter + 1 and then read programCounter to get the correct opcode?

Since the opcode is a single byte and instructions on the 6502 aren't packed into a word like on some other CPU architectures, endianness doesn't really matter. It does become relevant for 16-bit operands, but on the other hand PCs and most mobile phones are also based on little-endian CPUs.

0
votes

I wrote an emulator for 6502 some 25+ years back.

It's a pretty simple processor, so either a table of function pointers or a switch, with 256 entries for the bytes [the switch can be a bit shorter, since there aren't valid opcodes in all 256 entries, only about 200 of the opcodes are actually used].

Now, if you want to write a simulator that exactly simulates the timing of the instructions, then you'll have more fun. You basically will have to simulate much more of how each component works, and "ripple" through the units with a clock. This is quite a lot of work, so I would probably, if at all possible, ignore the timing, and just let the system's speed depend on the emulators speed.