Part 51: ROMhacking Technicals: Part 2
2. Introducing The Debugger
In this chapter we're going introduce the first and last tool in your romhacking arsenal: the debugger, or "the debugging emulator". To be 100% honest, I don't really know how accurate that "first and last tool" business is, but I can say it's probably the most important piece of software you can employ on this ridiculous endeavor. We're also gonna be taking a look at some assembly.
So, last chapter, I showed you how Policenauts stores its English in general via one of the FMV cutscenes, but Marc had wanted some in-game text hacking, so let's do that.
The good news is that there's some English in the in-game text, right at the beginning - in the Prologue. We've even seen the exact same string before.
Sure enough, if I take a savestate with BEYOND onscreen, I can find it in the savestate file. Since I know that English is ASCII prepended with hex 80, it's an easy search. Just B.E.Y.O.N.D again.
The header in this savestate is 2B0 long. Since it starts at 86418 in the file, I do some hex subtraction and find it's at 86168.
...The bad news is I can't find that anywhere else in memory, even with a savestate beforehand.
So, I had a problem. Changing BEYOND in the savestate file didn't affect anything - because the game had already used that text to draw with; I'd caught it when the game was done with it. But if BEYOND wasn't anywhere else in memory - except for when the game was ready to draw with it - what did that mean? The text had to be in memory somewhere... that meant it wasn't being stored the same way in the actual ROM.
It had to be encoded somehow.
"Wait a sec," you might ask. "But I thought Policenauts stored English like that? We found it before in the cutscene." Yeah, but the problem is that it's such a big game, it actually stores the text in multiple formats for different scenes. But that's not our concern right now - we just want to know how the game stores the text when you're actually playing the stupid thing.
So, let's open up our handy debugger. Again, PCSXTrace is apparently the way to go, but I couldn't get Policenauts working with it, so I used pSX with a debugger.
r5900, vu0, or vu1 are all options for the Playstation 2. As of pSX v1.13, it'll usually crash the program. There's actually a lot of options in the pSX debugger that I didn't use - either because I didn't need them or because they crashed - but that's okay because we only want r3000: the Playstation's main CPU. If this seems like magic tech speak, just take it on faith for now and know that even though I can't speak very intelligently about the Playstation 1's innards, I still managed to ROM hack the stupid game.
Starting up the debugger reveals the following:
Yikes. Where to begin. First of all, your mileage may vary on this screen shot but the menu marked "Window" will let you open or close these windows if you don't see them.
1. On your left you see a blank disassembly window. Soon, it's going to be full of code (instructions) that the Playstation has in memory. We can use that to determine - at a very low level - just what this thing is actually doing at given points in time. For budding computer programmers, this is what happens to your code when a compiler is done with it.
2. In the middle, we see a window marked Memory. It's RAM, and it looks exactly like your savestate file. pSX even lets you modify stuff right in there if you want - though I found it easier to modify the savestate file and load it to see if my changes took effect - it's taking snapshots of memory directly, so things can change very fast in the running game.
3. And on the right, we got registers. Registers are tiny bits of onboard storage. If you've done any computer programming, registers are the - and this is really glossing over shit - assembly programming equivalent of variables. Some are special ones you can't touch. r0 for example is always zero, even if you try to set it to something else. Don't fuck with r29 and r31, though we'll be learning about them in a bit. (They make functions possible!)
Okay, that's a lot to cover in one image, so let's take a step back. We found BEYOND in the savestate, so let's do the same with memory. Keep in mind that at this point in the game, BEYOND is still onscreen. (Note the following: the game is still running while I'm looking at memory. If you do this yourself, the registers are going to be changing values - more often faster than your screen can update them - and memory might be changing too. Don't fret too much over it, just know that it's not completely static.)
Use Ctrl-G on the Memory Window and you can jump to places in memory - you'll get a little popup that says "Goto". Before, we calculated that BEYOND was at 86168, so let's go there. I type in 0x86164 (0x is just a convention that says "This number is hexadecimal" - and come to think of it, I'll be using that convention from here on out.)
Why 86164 and not 86168? No good reason, honestly. You can see a whole lot of memory in that window, so any number around there is reasonable.
And you can see BEYOND in memory, just like in the savestate (look on the right). Cool. Changing it here still doesn't affect anything because, again, the game's already done reading that text and used it to draw onscreen.
Well...
Hmmmm.
Okay, we can't find that same text (byte-sequence) with a savestate before, so where do we go from here?
Look, something's writing that ASCII English so the game can read it and draw text with it. And we have to find out what that something is, so let's introduce a new tool:
The Breakpoint
A breakpoint marks code. It's something that takes a computer instruction and modifies it in memory to say 'listen, if you're in debug mode, halt execution when you hit this' The idea is that you can use a debugger to "pause" a computer program and take a look at the values of variables, or the general state of things. Now at the assembly level, this means the values of registers and the state of memory.
Even in a machine as old as a Playstation 1, instructions happen faster than you can read them and RAM - even at a paltry 4MB - is a lot for a human to parse. You'll need breakpoints to sort it all out, and there's two kinds:
- Execution breakpoints: "Hey processor, when you execute a specific instruction, stop"
- Memory breakpoints: "Hey processor, when you read/write a certan spot in memory, stop"
Right now, we know that an ASCII "BEYOND" is ending up in memory. We just don't know how it's getting there. So let's open a Breakpoint Window.
Selecting breakpoints opens up that big rectangular blank window with nothing in it. Hit "Insert" while it's in focus to open up that little menu. You can probably guess I want a memory breakpoint at 0x86168. Note too, I've got "write" checked, but "read" unchecked since I want to know when the game writes to that specific portion of memory, and more specifically how.
Memory breakpoints are pretty stable and they'll survive loading savestates - as opposed to execute ones, and we'll get to why later - so let's set it, and load a savestate from earlier in the game. Now let's click on the model of Beyond and make that text appear again.
Um. Wow, what?
Well, you can see the "Count" in the Breakpoint Window is now 1, meaning the game caught itself writing to 0x86168 and paused execution for me.
If you look at the disassembly window (the leftmost one) you can see a line is highlighted. It's:
> lui r20, 0x8008
This is a little confusing, because the instruction before that is what triggered the breakpoint:
sw r0, 0x0000(r20)
Why did that trigger the break point? Well, of all the assembly instructions the MIPS processor can do, here are the ones we care about with respect to memory access:
sb - store byte: this writes a byte from a register into memory
sh - store halfword: this writes a halfword (two bytes) from a register into memory
sw - store word: this writes a word (four bytes) from a register into memory
lb - this does the opposite of sb and loads a byte from memory into a register
lh - loads a halfword from memory into a register
lw - loads a word from memory into a register
Memory Access instructions take the following more general format:
instruction source-register, offset(target)
This is a mouthful no matter how I try to parse it to you, so let's put it this way:
Instruction = sw: This was storing a word
source-register = r0: This was taking a word (four bytes) from r0
target = r20: This was putting the value from r0 into the memory address stored in r20
offset = 0x0000: This field is used if you want to say something like "write to memory at the address 8 bytes ahead of the address stored in the register". Since it's zero, we can just ignore it.
So this means "take whatever's in r0 and write it to the address stored in r20". Remember, registers are just variables, so that's how we figure out what memory address to write to - it's stored in a variable. If you're thinking "pointer", congratulations, you're a programmer. If you don't have any idea about that previous sentence, we'll get to it a little later.
So, from this we can ascertain that something is writing zero to memory - because r0 is currently (actually always) zero, according to the Register Window.
...
...Well, the ASCII for BEYOND certainly isn't zero, so what's going on? Remember that the game console's a complicated machine, so it might be doing other dumb bullshit you don't care about, but in our case, I'm guessing it's zeroing out - or clearing out - memory in order to write BEYOND there in the next operation. In other words, this isn't what we're looking for.
From here, we can do 2 things - Continue or Run.
Pressing F8 will Continue. This will go one instruction ahead and show you how the registers and memory changed (i.e. what changed after that instruction executed). But fuck that, this isn't what we want - so let's press F9 which is Run. That will tell the game to continue execution until it hits the a breakpoint again.
Looking above the highlighted instruction, we can see what triggered the breakpoint:
sb r2, 0x0000(r17)
So it wrote something from r2 into the memory address at r17. According to the register window on the right, r2 has 0xffffff80.
Well, the 80 is interesting because that's what we're looking for (all the ASCII is prepended with 80), but what's with all the f's? I think this is what we want so let's Continue with F8. The highlighted instruction is addiu r17, r17, 0x0001 so....
Predictably, it moved ahead one instuction, and executed "addiu r17, r17, 0x0001". This is the programming equivalent of r17++. Or for the layman, it incremented the number stored in r17 by one.
addiu = add unsigned immediate: it means "I want to do an add operation. I don't care about signed integers (i.e. don't worry about negative numbers, just do bitwise addition) and use an "immediate".
A normal add operation takes two registers and puts the sum in a third. An "immediate" is a constant value (i.e. a straight-up number, not stored in a register or anywhere). Note you can only use one immediate as an addend - in other words, you can't sum two immediates - you'll need at least one register as an addend.
Also somewhat obviously, addiu's permutations: add, addu, and addi are also supported in MIPS assembly.)
Since it wrote to r17, the increment increases the address. In other words, it's getting ready to write the next byte to memory.
Coming up we have:
slti r2, r3, 0x00ff
bne r2, r0, 0x0007013c
addiu r16, r16, 0x0001
We'll pretend we don't know what's going on just yet and just tap F8 a couple more times, until we reach the bne and...
It jumps back to 7013c, and settled on lbu r2, 0x0000(r16)
You can probably guess that loads a byte from the address at r16 and puts it in r2. Actually, if you look, a few lines later at 70164, you'll see that same bne that jumped us back to 7013c.
Congratulations, now you know what a compiled loop looks like.
If you keep pressing F8, you'll notice it'll does the same thing a few times - gets to the bne and jumps back t0 the lbu.
(By the way, see how lbu r2, 0x0000(r16) is repeated at 7013c and 7014c? Why is that happening twice? ...I have no idea! For our purposes, it really is as redundant as it looks.)
Okay, so if we look at this loops, it's reading stuff from r16 and writing ASCII to r17. How's it doing that? Well, let's look at what r16 actually is.
According to the Register Window, r16 is 0x800eb937. Let's keep going...
We're back at the sb instruction which triggered our breakpoint. This time, the value in the register (r2) is 0xffffff42 - And if we look back at our ASCII chart, hex 42 is indeed ASCII capital B.
So this looks like the loop that's writing the ASCII "BEYOND" for the game to draw... but why couldn't we find that ASCII anywhere else in memory? Let's look where it's reading from. That was eb937 according to the r16, so...
Now this is kinda weird. You can see the 80s which prepend the English characters... but what the hell is BE? Or that 80BB right after it? And 80A7 after that? (Note: See the pattern?)
The good news is we can find THIS hex sequence in our savestate.
Here's the trick. We can watch the value of what's being read in memory with the Registers window. Going back to our assembly, we can see lbu r2, 0x0000(r16), so it's reading eb937 and putting it into r2.
When it loads it, "0x000000BE" - whatever that is (note this a hexadecimal number, the fact that it happens to be "BE" has nothing to do with the fact that BEYOND starts with "BE") - stays in r2 for awhile...
Until it hits subu r2, r0, r2 which is a new assembly instruction: subu - subtract unsigned
You can do "sub" or "subu" - Just like "add", it's the assembly equivalent of "subtract". Again, "unsigned" means "I don't care about positives or negatives, just do bitwise subtraction... which is kinda weird. Anyway, this is basically saying: r2 = r0 - r2. r2 is always zero, so normally, this is kinda like flipping the sign on a number...
...Except it's an unsigned operation... so... what's it doing?
Well, let's take a look at our upcoming numbers in memory and see what happens to them:
80 stays 80
BE becomes 42 after the subu (which is ASCII B)
80 stays 80
BB becomes 45 after the subu (ASCII E)
80 stays 80
A7 becomes 59 after the subu (ASCII Y)
So the trick is, it looks the game is storing English in a format where the subu "flips" it to ASCII. You can sorta eyeball the pattern - BE is 3 higher than BB, and "B" is 3 lower than "E". It's like as the English letters go up, the hex numbers go down. A little ahead, you can also see 80B1 and 80B2, which are one-off from each other just like the "O" and "N" in BEYOND.
Things like this are why it's easier to try and hack the English - unless the Japanese alphabets (okay, okay, syllabaries) are second nature enough to you to know that "ke" is like three characters up from "ka", it's harder to figure that out.
So let's change it in a "before" savestate and see what happens.
I realize that's just a big mess of numbers, so I put the cursor on what I changed. I decreased the original BE to BD. So let's load the savestate and go to the text and...
Tada. Armed with the knowledge that Polcienauts stored the in-game English "flipped" like that, I manually wrote out:
And posted it in the Let's Play. Marc was still unimpressed however - again, his old programmer had done that, he said. The problem was when the English was bigger than the original Japanese. Bastard.
At the time, I didn't realize the 80s were entirely superfluous. But it made me wonder. If the 80s marked off English, and this was intended to be an English ROM hack... couldn't there be a way to have the game automatically put the 80s in, and just store the encoded English without them, thereby cutting the amount of storage needed by half?
With that idea, it was time to try my hand at assembly programming.
...But first, while we're here, let's veer off for a second and try a little experiement. Policenauts presents a unique, but not very difficult ROMhacking challenge: finding encoded text. In our case, the 'decoding' algorithm was just one instruction:
That subu is at 0x00070154. That's literally a memory location - the instructions for the game are loaded into memory to be executed, after all. So... wouldn't it be in our savestate?
Well, 70154 + a 2B0 header = 70404, so let's look there in the savestate.
Well, that's interesting. According to the Debugger Disassembly window, subu r2, r0, r2 should be 00021023. But in the savestate, it looks like 23100200. It's... almost backwards? What's with that?
Well... fuck it. Let's just zero that out in the savestate, load state, and see what happens.
With nothing to "decode" the text, the game gives us garbage. But that also means we can just put unencoded ASCII right in the savestate.
For the record, I kept the subu in the final product and when we patch the English in, I do my own subu to flip the English Marc gave me, and the game's subu flips it back. In retrospect, it might have made my life easier to have gone on without it...
But I also wasn't 100% sure how the game was doing everything, and it probably wasn't a wise idea at the time to just remove stuff without more knowledge of how it might affect things later.
But enough about reading assembly. Let's write our own.