Saturday, February 19, 2011

I started this project with no real knowledge of how the NES works. I had a background in computer science and had been working in IT for more than 5 years, so I had some idea of what assembly was and how it worked, but I'd never worked with it in any extensive capacity and had no experience with electical engineering. All of the programs I'd written in assembly had been very simple. But I undestood how assembly basically worked and I had also had some experience playing around with emulators.

This was important in that the emulator I was thinking about writing is built on conventions used in other emulators. One of the first things I did was determine how I was going to load the instruction set. This was relatively simple. I wanted to be able to load zipped .nes files because as far as I could tell most games were distributed in this format. There was a lot of pretty decent documentation on the .NES format that was pretty easy to find.

Next, I divided the NES platform into discrete parts. The first part I decided to write was the CPU. I knew very little about the platform, but to my knowledge all other parts of the platform relied on the CPU. Also, the CPU was the part that had the best documentation. The 6502 processor used in the NES is pretty well known. There are entire books and web sites devoted to documenting how it works. Also, the CPU was easy to test. The CPU is relatively simple. It performs one of a limited set of actions on data that is obtained from memory in a limited set of ways and stores this data in memory locations specified in a limited number of ways. Implement all of these actions and ways to obtain and store data and you're done. I'd done programming for a long time, so I had an idea of how the CPU functioned. This was a little lower level of programming than I was used to doing, but I still had a pretty decent idea of what was going on and there was plenty of documentation.

The PPU was a bigger kettle of fish. The PPU, or Picture Processing Unit, generates all of the images on the screen. Like the CPU it has pretty well defined ways of doing what it does, but because I hadn't ever looked at a PPU before or read descriptions of one it was easy for me to make poor assumptions regarding how they do what they do.

For instance, in the very beginning I decided that all of the RAM was going to be an integer values. Since I'm a java guy I have the option of using either an int or a byte to represent these integer values. At first I used a byte. It makes some sense. However, I ran in to some issues regarding the fact that java bytes are signed. They represent values between -128 and 127. If you put 128 into a byte it will be viewed as -128. There are ways around this, but I read some explanations on the internet that suggested it was probably best to just use an int.

While working with the CPU, this worked great. I didn't really care very much what part of the memory I was writing to in my early tests. If you look at my test cases for the CPU methods they usually just begin at memory location 0. When I started working with the PPU, though, memory became more important. The CPU interacts with the PPU through specific memory locations. For instance, if the software wants to write to the PPU memory in order to set the palette or the nametable or something, it would first write the address to 0x2006 in two parts with the high bit sent first and the low bit next. All subsequent writes to 0x2007 are sent to the PPU ram address which starts at the address specified by 0x2006 and increments with each write.
I don't propagate all writes to 0x2006 to the PPU. Instead, I have the PPU check 0x2006 each cycle for it's current value. This was causing a problem in that my emulator had trouble distinguishing when a write had been done to 0x2006. So, it would do two reads from 0x2006 on the first two cycles and start writing to PPU memory address 0x0. Clearly registers didn't function how I thought they did.

The problem now became how to determine when a write had been done? At first I assumed that the first write to 0x2006 would be a value greater than 0. This was slightly better, but I was still seeing a bug. Everything was kind of working, but not quite. I decided to look at the assembly code to see exactly how it was setting the registers.

The assembly looked like this:

adc #$20 ;Load Name and Attribute Table
sta $2006
lda #$00
sta $2006
ldy #$00
ldx #$04.LoadTitle
lda ($00),Y ;Load Title Image
sta $2007
iny bne .LoadTitle
inc $01
dex
bne .LoadTitle
rts

First it sets 0x2006 with the PPU memory address of 0x2000 -- the location of nametable0. This determines what sprites will be placed on the screen. It turned out that the problem was the couple of commands between the writes to 0x2006 and the write to 0x2007. I had assumed that as soon as the full address was written to 0x2006 that 0x2007 would start being filled with values that could be written to memory. Another assumption I made was that the address 0x2007 writes to should increment every cycle. Again this was false. It increments each write, and there may or may not be a write every cycle. It turned out I needed to know exactly when a write was made to each register.

There are many ways to solve the issue. I personally decided to go with each of the registers being java Integers rather than ints so that they could hold null values. Also, after each cycle I set them back to null. This way, I can do a simple null check to determine if the register holds a value.

This is why disassembled assembly is good to have around during the debugging process. In the end I'm trying to create the environment that the assembly needs in order to function, and sometimes the easiest way to figure out what it needs is to take a look at what it's trying to do.

2 comments:

  1. I took a look at the latest code you posted. Congratulations on getting video working! I managed to get your code to compile and both Balloon Fight and the nestest roms display an image.
    However, there are several problems that I noticed right away.
    First of all, the user interface thread is constantly polling the emulation thread to see if it's still running and if so, redrawing the window. This completely occupies one CPU core and slows everything else to a crawl. Changing this loop to
    while (Platform.isRun()) {
    content.repaint();
    try {
    Thread.sleep(1);
    } catch (InterruptedException e) {
    }
    }

    }
    quadrupled the speed of your emulator and made Java stop using 100% cpu. Ideally, the user interface thread should only refresh itself when the emulation thread has finished drawing a frame, but this change should at least improve things.
    Secondly, you seem to be ignoring the flag that tells the PPU which pattern table to use for the background tiles (it's bits 2 and 3 of the PPUCONTROL register).
    This is why Balloon Fight's tile screen is garbled.
    Thirdly, you're definitely not implementing the joypads correctly. Besides the fact that you appear to be mapping the joypad registers as output registers instead of input registers, the joypads are read through a serial interface, not a parallel one. First, the joypad is latched by writing a 1 then a 0 to the least significant bit of $4016/$4017, then each read of $4016 or $4017 returns the state of a different button in the least significant bit, in the order A, B, Select, Start, Up, Down, Left, Right.
    Hope this helps.
    Once again, you can take a look at my emulator's code at http://code.google.com/p/halfnes

    ReplyDelete
  2. hi -- thanks for your suggestions. They've been pretty helpful.

    the joypads are one of the main things I've been trying to get to work lately along with speeding up the code. In the latest code (not yet released to svn) I've put your first suggestion regarding the gui in place. That fixed a lot of problems. Thanks!

    I think I implemented the flag that determines which pattern table I'm using, but it definitely looks like there's a bug there. I'll look into that -- I'd love to have video working for an actual NES cartridge. It's possible I implemented it, but forgot to have the PPU take it into account or something.

    Anyway -- thanks again for your helpful comments.

    ReplyDelete