The Let's Play Archive

Policenauts

by slowbeef

Part 50: ROMhacking Technicals: Part 1

1. Starting Out

Look, I don't care what kind of anime jerk you think you are, unless you are beyond-fluent - I mean, like really Japanese is second nature to you - you're going to want to be experimenting in your ROM hack project in your native alphabet. In fact, once most of the game is readable to you, that's when you really start understanding how it's working and what events are triggering what.

If you don't know Japanese at all, it really is a struggle to try and connect game events to hex code to Kanji, so the first thing to do is play a big chunk of the game and find some English.

Fortunately for you, Japan thinks English is the bee's knees, so you're likely to find it somewhere, even if the majority of the game isn't in it.

Either of these would work well as starting points, and in retrospect, searching on "POLICENAUTS" might have been the smarter idea, since it's one longer string. But for whatever reason, when I decided to tool around with Policenauts, I focused on "BEYOND COAST" in the intro movie, and tried to see if I could fool with that.

(This was kind of a poor idea, because that furigana - the tiny characters above the letters - could have made that more complicated. I got lucky as we'll be seeing.)

Now, I mostly wanted this document to match how I did it, but I'm going to diverge here, because I did something silly. I started searching through the whole ROM for "BEYOND COAST". That's ridiculous - the entire Playstation ISO is about 722 MB, which means any search is going to take forever, and you won't be editing the ROM directly anyway on a big job like that. For the Playstation, let's introduce a new and very critical concept:

Savestate hacking

Savestates are pretty much just memory dumps. The emulator takes RAM and just dumps it to a file. When you load state, it blows away the current RAM and just loads it back into memory. That's why savestates are so quick - it's just a really fast memory swap.

The advantage to savestate hacking is two-fold:

- At only 4 MB of memory, you have a lot less to sift through. - If you make changes to the savestate file, you can load the state and instantly see your progress (if any), as opposed to having reset the whole game.

So, let's take a savestate while the text is on-screen.

Some ROM hackers are going to raise a hand here, and you'll read why in a minute, but let's just keep going with it.

So, hat in hand, let's open our savestate file in a hex editor. I use MadEdit, but as far as free hex editors go, I guess it won't matter too much. MadEdit lets me regex search for English, and copy/paste chunks of hex. It became invaluable later, so I'd personally recommend. Artemio recommends 010 - a not-free hex-editor that actually lets you set up "binary templates" which will fontify your buffer. That's fancy tech-speak for "you can teach it to make certain parts different colors in order to read it a little better."

Another important tool is your "debugging emulator". Gemini - who is much, much more experienced than I am - recommends PCSXTrace. The only reason I didn't use that is because I could never get Policenauts working in it, so I went with pSX with a debugger.

It's unfortunately not very close to the hardware - it doesn't do pipelining, and neither does ePSXe - and there's a tool we can use for hardware-verification, but we'll get to that in the assembly programming part of the doc.

If you do use PCSXTrace, just a note, your savestate files are compressed, so you'll have to uncompress them before screwing around. But fuck that noise, let's get to the heart of the matter.

Let's look at the savestate file!

So intuitive!

(It gets easier.)

pSX inserts a variable length header - usually about 2B0h-350h - before memory begins. You can find exactly how much by hitting Page Down and looking for the word "RAM" on the right:

It's something to keep in mind because if you account for that header, the file locations will map exactly to where things are assembled in memory. In this case, the header is 350h long. So if something is at 864B4 in the file, subtract the header (350) and you'll find in the emulator, that something will be at 86164 in memory.

How did I subtract my hexadecimal so easily?

Yep, just Windows calculator. Just switch it to Scientific (view) and check off hex. You don't wanna do hex in your head - after a year, I still never get used to five plus six is now B, and hex multiplication can get pretty rough.

I'm getting off topic, though. Let's look for "BEYOND COAST"

Hmmmm. You'll notice I have "regular expressions" checked off, and we'll get to that in a couple paragraphs, but let's talk about what I'm really doing and why it failed.

I did a plaintext, ASCII search. English text on your computer tends to be encoded in this archaic - but functional - format known as "ASCII" - the American Standard Code for Information Interchange. Most programmers like it because it's pronounced "ass-key". You might think it's unwise to look for the American Standard Code in a Japanese video game, but it's actually a pretty universal method for printing English.

...Well, if it's so universal, how come I didn't find the text? I didn't know at the time, but the next thing I tried was another important ROM hacking concept:

Relative Searching

Here's an ASCII chart from asciitable.com (even though the watermark says lookuptables.com).

It's got the ASCII codees in decimal (not as useful as you might think), hex (very useful), octal (who cares), and HTML. And oh yeah, the character.

Note how capital A is 65, B is 66, C is 67, etc.

Let's just say - like me - you can't find the English. If the game programmers used some funky proprietary character set - meaning you don't know what numbers map to what letters, they probably still did it in such a manner that:

- B is one higher than A
- C is one higher than B and two higher than A
- T is the twentieth letter, and therefore 19 higher than A, etc.

This is where you need a relative searcher. A relative searcher will take a binary file - like your savestate file - and such for a string. So if you look for BEYOND - it will say:

"I want to find a number (B) such that the next number is 3 higher (E), then next number is twenty higher than that (Y), then ten less (O), one less (N), and ten less (D)."

You might hit other number sequences that happen to match that, so the longer string you can find the better. In this example, it would have been better to search on POLICENAUTS, but if it's hidden strangely, you'll still have trouble finding it.

Like I did. Relative search didn't work for me.

Hmmm.

You'll hear ROM hackers say you need a little bit of luck when ROM hacking and honestly, sometimes you do. You just need to try different things. It took me a couple days to think this up, but it hit me: "Maybe the English isn't just one after the other like in ASCII."

And it turned out I was right. Thanks to Japanese's horrible third alphabet (yeah, really, they have three) Kanji, which has a few-thousand characters, you need bigger numbers to represent it. English only has 26 letters, so even with uppercase and lowercase and all the punctuation, you only need 128 numbers for most everything (hence why the ASCII table goes from 0 to 127).

Japanese, though, needs more. Hmmm. So I tried this search:

The periods (in a regular expression search, which I have checked off) mean "any single character. So that will match BBEEYYOONNDD or BzEzYzOzNzD. It's really something like searching for B*E*Y*O*N*D.

It's a shot in the dark, but it turned out to be a good move.

You can see it there - it's B-E-Y-O-N-D with the Euro currency symbol, naturally. Okay, dumb jokes aside you can also see "COAST" and even the space between them a little later if you look on the right side. In fact the hex numbers even match our ASCII chart.

So Policenauts appears to write its English in ASCII - it's just that each character is prepended with that euro symbol, which is a hex 80. If you're wondering what all that crap between BEYOND and COAST is, it's the furigana (the little Japanese over the words). Another reason why BEYOND COAST wasn't a great choice, but we're getting somewhere, so let's keep going.

That's great for theory, but let's test this out. Let's type something different and load up our savestate.

What kinda gyp is this?! It's the only place B.E.Y.O.N.D appears in memory, but nothing happened.

Well, you have to realize it's a multi-step process for the game.

1. Load up the text into memory
2. Read the text in memory
3. Figure out how the text is drawn from the font file
4. Draw the pixels representing the text

When we took the savestate, it was while the text was onscreen, so it was here:

1. Load up the text into memory
2. Read the text in memory
3. Figure out how the text is drawn from the font file
4. Draw the pixels representing the text

It's done with that text as of the savestate, so actually, we should take another savestate before it happened.

Now, you can't do it too early or it won't be in memory yet. Usually after the previous text is gone but before the text your looking for appears is okay, I found. (And actually, that's for a subtitled movie. Odds are, you'll be dealing with standard dialogue boxes, so whenever it's about to come up - but before the text actually hits the screen is fine.)

So, as you can see, the BEYOND COAST text was in memory right before the text appeared, and this time, changing it yielded...

Tada. Now so long as our English is precisely the same length as our Japanese - which I'm sure it probably is - we're pretty much done with ROM hacking!

...Even I can't play along with that joke for very long.

We did learn some important things, though - we learned that Policenauts can print English out of the box, which is great because that saves us a lot of work. We also learned that it prepends each English character with an 80 in hex, which is also important.

You might wonder why we took that savestate while the text was onscreen at all. Well, it'll become more apparent in the next section, but here's the deal. Sometimes the text is stored compressed or in a format you just can't find. Taking the savestate during shows you the text when it's in an uncompressed format. It's necessary to know what the output is going to look like, so you can find it in memory later and look for what's actually putting it there (decompressing it.) In this particular case of the video cutscenes, the English is stored on the ROM as it is here, so we didn't need to do that. But in the very next case, it's stored differently on the ROM then how it is in memory for drawing, so we'll need the "during" savestate just so we can find where in memory everything is happening. Also, a fun fact - those 80s turned out to be superfluous. Policenauts will still understand if you omit them, so...

And tada...

But we're getting ahead of ourselves. At this point, when I discovered this, I changed it to "Beyond Toast" and told Marc about it, to which he replied that Junker HQ had already figured out the movies - it was the in-game text that was the problem.

Hmmmm...

So screw cutscenes. Let's go to the actual game.