The Let's Play Archive

Policenauts

by slowbeef

Part 52: ROMhacking Technicals: Part 3

3. A Cheap and Easy Assembly Hack

What I wanted to do was this. I knew the loop was reading the encoded text from r16, decoding it, and writing it to the address at r17. More specifically, it seemed to be reading a byte, subu'ing the byte to flip it, and writing it.

Now, at this time, I'm thinking we can cut the amount of storage in half by removing those superfluous 80s. So, what I want to do is write a hex 80 before those things happen.

There's a couple of ways to introduce that to a loop, so we can:

1. Load an 80 into a register
2. Always write an 80
2. Read the byte
3. Subu the byte
4. Write the byte

Or introduce some kung-fu code and:

1. Read the byte
2. Add 8000 to it (so it becomes 80 followed by the byte)
3. Subu the whole thing
4. Write two bytes (a halfword)

I like that second one - let's try it out.

So, let's look at adding 8000 to r2 and writing the halfword. Specifically, let's look at line 7014c:

lbu r2, 0x0000(r16)
nop
subu r2, r0, r2
sb r2, 0x0000(r17)
addiu r17, r17, 0x0001

[WARNING: THE FOLLOWING CODE WILL ONLY WORK ON EMULATORS, NOT THE ACTUAL HARDWARE]

So, roughly, that nop gives me a little room to work with. First up, let's add 8000 to r2 after we load it. (In hardware, you should always have a nop or unrelated instruction after a load instruction; this is due to "pipelining" and we'll talk about it a bit later. But since we're only doing emulators and the purpose of this exercise in an introduction to ASM, let's keep going for now.)

Now, "8000" is a constant, and in assembly terms, that's an immediate. So we want the add immediate instruction or:

addi r2, r2, 0x8000

This will increment the value r2 by 8000. Our code should look like:

lbu r2, 0x0000(r16)
addi r2, r2, 0x8000
subu r2, r0, r2
sb r2, 0x0000(r17)
addiu r17, r17, 0x0001

We're actually almost done - we just need to have it store a halfword instead of a byte. Why? Well, if r2 is 8030 (ASCII zero), sb will only store a byte - which is 30. We want it to store two bytes, or a halfword, so our instruction becomes:

sh r2, 0x0000(r17)

Finally, we're incrementing r17 by 1 for each byte we write - but now we're writing two, so we want to change:

addiu r17, r17, 0x0001

To:

addiu r17, r17, 0x0002

In summation, our changes will be:

lbu r2, 0x0000(r16)
addi r2, r2, 0x8000
subu r2, r0, r2
sh r2, 0x0000(r17)
addiu r17, r17, 0x0002

Great! ...How do we actually do that? Well, let's open up our savestate in a hex editor. I've shown you MadEdit, so here's 010 which is a nice app if you can drop the $20 on it.

Our RAM header is at 2B0 in the savestate file. Our instructions started at 7014c, so adding that up gives us 703FC. Let's look there.

I've highlighted the four instructions in hex. A byte is 8-bits, a Playstation has a 32-bit processor, so each instruction is 4 bytes. (Bytes are made up of two hexadecimal letters.) The byte code for our instructions in the savestate are:

00000292 - lbu r2, 0x0000(r16)
00000000 - addi r2, r2, 0x8000
23100200 - subu r2, r0, r2
000022A2 - sb r2, 0x0000(r17)
01003126 - addiu r17, r17, 0x0001

Since we need to edit that, let's look in the debugger. I'm going to set an execute breakpoint for 7014c and click on the model of Beyond. Note: Execute breakpoints modify instructions in memory. Loading a savestate will blow away your breakpoints. Saving state with breakpoints on is also not recommended since your debugger needs to told about breakpoints while it's running, and that won't be the case if you start fresh and load that state.

Now we're about to click the model...

And when we do:

There. As you can see, the breakpoint modified memory - specifically the nop. Let's look at the first instruction and compare it to our savestate.

Savestate:
00000292 - lbu r2, 0x0000(r16)

Debugger:
92020000 - lbu r2, 0x0000(r16)

There's that "almost backwards" thing we saw earlier. What is that? Wellllll....

Endianness

In the story of Gulliver's Travels, as it were, the Lilliputians were at war over which side of an egg you should start to eat from - the big side, or the little.

The joke here doesn't need explaining, but we're nerdy ROMhacker types (or we'd like to be), so let's ruin it. Of course, it doesn't matter what side you eat an egg from - that's pretty arbitrary.

So let me ask you this. Suppose you were architecting a computer that could store and read a 4-byte (32-bit) instruction in memory. Does the leading byte go first, like in English? What if you have an older archicture and it can only read one-byte at a time? Would it make more sense to store the lowest-order byte first and "stack 'em up?" In other words, should a computer store "one hundred" as "100" like we do? Or "001" so long as it knows that's backwards.

Well, the truth is that computers are architected by all manner of dork, and on older architectures, some were one-way, some were another. Most agree on bit-endianness, where the most-significant bit is on the left of the byte, going right (reads left to right, like our base-10 numbers), but there's some differences in byte-endianness.

Wikipedia can fully verse you in Endianness, but the pointed explanation here is this:

In the debugger, and when you compose (assemble) the instructions yourself, you will be using big endian.

In memory, or on the ROM (disc) itself, the instruction data is little endian.

To convert, suppose a byte sequence that looks like this:

12 34 56 78

Reorder the bytes like this:

78 56 34 12

Note that switching Endianness is the inverse of itself, meaning if you do it twice, you'll get your original sequence back. But that's why the instructions in the debugger look "almost backwards" from how they do in the savestate.

If you'd like to see this yourself, let's look at the loop and the code again. Keep your eye on the nop at 70150.

lbu r2, 0x0000(r16)
nop
subu r2, r0, r2
sb r2, 0x0000(r17)

In our savestate, once again:

So, you can see the nop (it's 8 zeroes or 4 00's in a row). I'm going to make the nop into the previous instruction.

So, let's load our modified savestate, remake the breakpoint, and re-trigger it...

Yep, now we've got two lbu's. So we have an idea on how to insert/edit instructions into the game, so let's figure out how to turn our code:

1. lbu r2, 0x0000(r16)
2. addi r2, r2, 0x8000
3. subu r2, r0, r2
4. sh r2, 0x0000(r17)

Into byte data to insert into the savestate. Lines 1 and 3 are the same, so we only care about 2 and 4. How do we make a nop into addi r2, r2, 0x8000? It's at this point I'm going to introdcue an indispensable and wonderful document... drumroll please...

Everything You Have Always Wanted to Know about the Playstation But Were Afraid to Ask by Joshua Walker

This document is so good that I would highly recommend saving a local copy and not just viewing online. Walker breaks down pretty much all the hardware of the Sony Playstation, but what we care most about is toward the end: The Opcodes.

R3000A OPCODE ENCODING The following shows the opcode encoding for the MIPS architecture. OPCODE Bits 28...26 32-29 0 1 2 3 4 5 6 7 0 SPECIAL BCOND J JAL BEQ BNE BLEZ BGTZ 1 ADDI ADDIU SLTI SLTIU ANDI ORI XORI LUI 2 COP0 COP1 COP2 COP3 † † † † 3 † † † † † † † † 4 LB LH LWL LW LBU LHU LWR † 5 SB SH SWL SW † † SWR † 6 LWC0 LWC1 LWC2 LWC3 † † † † 7 SWC0 SWC1 SWC2 SWC3 † † † † SPECIAL Bits 2…0 5…3 0 1 2 3 4 5 6 7 0 SLL † SRL SRA SLLV † SRLV SRAV 1 JR JALR † † SYSCALL BREAK † † 2 MFHI MTHI MFLO MTLO † † † † 3 MULT MULTU DIV DIVU † † † † 4 ADD ADDU SUB SUBU AND OR XOR NOR 5 † † SLT SLTU † † † † 6 † † † † † † † † 7 † † † † † † † † BCOND Bits 8…16 20…19 0 1 2 3 4 5 6 7 0 BLTZ BGEZ 1 2 BLTZAL BGEZAL COPz Bits 23…21 25…24 0 1 2 3 4 5 6 7 0 MF CF MT CT 1 BC † † † † † † †

Damn. What does this mean? Well, there's 3 types of instructions you'll want to make:

R-Type - this instruction type operates on 2 registers
I-Type - this instruction type operates on a register and an immediate
J-Type - this instruction type performs jumps

This following HTML stolen from Wikipedia:

Type -31-                                 format (bits)                                 -0-
R opcode (6) rs (5) rt (5) rd (5) shamt (5) funct (6)
I opcode (6) rs (5) rt (5) immediate (16)
J opcode (6) address (26)

Alright. It works like this. The first 6 bits (note, NOT bytes - we're actually working with the real-deal 1s and 0s) are going to tell us what instruction we're making. We want to make addi, so let's look at that opcode grid and we discover that bits 31-29 should be 1, and bits 28-26 should be 0.

If you don't know how to count in binary, here it is - and if you're serious about ROMhacking, I suggest you learn it quick, because you are gonna be doing a fuckton of it...

000 - 0
001 - 1
010 - 2
011 - 3
100 - 4
101 - 5
110 - 6
111 - 7

So addi is 1,0 or 001000. Cool, we're getting somewhere! Just 26 more bytes to go!

Next up, since we're doing an add immediate, we know it's an I-type instruction, so we need the values of rt, and rd. Since we know that the instruction is:

addi r2, r2, 0x8000

You can guess from the order of the instructions that rt is r2, and that rd is... well, also r2, so that's pretty easy. :) They're 5 bits apiece, 2 in binary (for 5 bits) is 00010 and that makes our new instruction:

001000 00010 00010

16 bytes! We're halfway done! Finally, we want to add 8000. That's 1000 0000 0000 0000. But we don't need to convert that to binary, since it comprises half the opcode and divides evenly into the second half. We can just plop immediates onto the end of the hex instruction we're composing.

Let's convert the rest to hex, and I'll show you what I mean. We'll take our 16 bits and group them in fours.

0010 0000 0100 0010

In hex, these numbers are 2042. We can just plop our immediate afterward and we get:

20428000

If we swtich Endianness, we get:

00804220

And that's what we have to put in the savestate. Like so.

Once again, load savestate, set breakpoint and trigger.

Hmmm. Close, but the immediate is -ffff8000 somehow. Although I'm a pretty lazy guy, I felt like this mistake actually ended up a little illustrative. The problem is the leftmost bit is the "sign" - it acts as a negative number. 8000 is:

1000 0000 0000 0000

We need it to ignore signs in the immediate, so we actually need addiu, not addi. So, doing that again, we look at our opcode grid and see addiu is (1,1) so our binary becomes:

001001 00010 00010

0010 0100 0100 0010

24428000 (We add the 8000 immediate at the end)

Reverse the endian:

00804224

Put that in, reload state, set breakpoint, trigger...

Yeah, that looks better. Let's press F8 to step over the instruction. Note r2 on the right is currently 80. The addiu will add 8000 to it, and hopefully make it 8080.

Well, it became ffff8080. Since we're storing a halfword, it'll correctly grab the 8080 and we should be good, right? Right?

...Well, almost. Because the next instruction up is Line 3, the subu. And when we step over that...

Hmmm. 7f80? Odd that the subu flipped the first 80 to 7f and kept the second as 80. I'll bet it has to do with those leading ff's. But rather than try and figure out crap with the MIPS architecture, we'll do a bandaid fix and add 7f00 instead of 8000 so that the subu will flip it to 80 correctly.

So our instructions now end up being:

007F4224

This time, it flips correctly and we end up with ffff8080. Last thing to do is store a halfword, instead of a byte. This should be pretty easy. The bytecode is:

000022A2 - sb r2, 0x0000(r17)

Which, again, switch the Endian:

A2220000

Loads and stores are also I-type instructions - in this case, the "immediate" is the offset from the address we're loading from/storing to. It's usually zero. At any rate, breaking apart the hex into binary...

A2 22
1010 0010 0010 0010

We really only care about the first six bits.

1010 0010 0010 0010

101 000 10001 00010

That's 5, 0 which according to the opcode grid is indeed sb. We want sh, which is 5, 1. So doing all the steps with that in place...

101 001 10001 00010

1010 0110 0010 0010

A622

A6220000 (offset's still zero) and switch the Endian to make 00 00 22 A6.

We plug that in, reload state, breakpoint. If this seems monotonous now, ROMhacking may not be the hobby for you.

The inserted code will look like:

Finally, we change 01003126 to r17, r17, 0x0002. Since it was 0001 to begin with, all we have to change is 01003126 to 02003126. (Because if you switch the Endianness, you can see the immediate 0x0001 is really easy to spot.)

Putting it all together, I'll modify the original text in memory to one-byte ASCII without the 80s.

(Sorry, I threw in the flipped ASCII manually from the script I wrote to generate it, so I have a couple typos.) But since the game ignores the 80's if they aren't present anyway, just to prove it to you, here's the writing buffer we found at 86164:

And you can see the ASCII prepended with 80s.

Since the 80's were superfluous anyway, who what was the point of this exercise? Well, for one thing, it's a way we can get our feet wet in MIPS before we really go to the nitty gritty stuff. But more generally, this seems like a good method of doubletext hack. Japanese requires 2-bytes per character because of the Kanji in a lot of modern games, but English characters and punctuation can be covered in one byte. If you can discover the method by which the text is being read from data and put into a writing/drawing buffer, you can overload that and put in your own code.

We'll get into that in Chapter 6, when we talk about jump hijacking (i.e. actually introducing wholly new assembly). But first, we're going to talk about a couple more concepts - notably finding and modifying text pointers in Chapter 4 - and the thing you'll actually need to finish a CD-ROMhacking project, the build in Chapter 5.