Part 43: Tales of ROMhacking: Part 2
Tales of Romhacking!I ran into a big problem before I even got to start: Marc Laidlaw is very fluent in Japanese, he excels even in different dialects, and he's a professional game translator to boot. You probably couldn't ask for more qualifications for an unofficial game translator.
But he was not a programmer.
Marc's summary of the problem their coders had was: "The English is too big to insert; it takes up too much space."
But what did that even mean? What was 'too big?' What happens if you try to insert "a larger script"? Does the game crash? Why does it? What do you overwrite? Isn't there "extra space" on the disk?
I wasn't really asking these questions from a technical expertise standpoint; I really didn't understand the problem at even a more basic level. I didn't really even know what "inserting the script" meant. (I mean, I had notions, but qualitative hazy ideas. Nothing concrete or useful.)
I was a little lucky in that I'd befriended a couple of people on the SA forums; notably Scarboy - a younger programmer from Montreal who'd known something about ROMhacking. Unlike me, he wasn't cynical about his own abilities.
When I'd posted the fourth and final update to the Policenauts Let's Play (Prologue Only), I didn't want to pressure Marc, so I gave myself a challenge. I would work on the ROMhack for one month, before asking him for the translation to the next part, Act 1 of 7. So I started small.
Very small.
There's a scene in the intro movie that mentions "BEYOND COAST" with English text so I decided to try and find that one the disk. I managed it after some doing and replaced it with "Beyond Toast" just to show Marc I could replace the in-game text. (Note: how I actually accomplished this is mentioned in the first tech article in the OP) Replacing the text wasn't enough, though, Marc said. The team had already figured out the movie files - it was the actual game text.
Huh.
Okay, step two. I went into the actual game. Fortunately, "BEYOND" appeared again in English, but I couldn't find it like I did before. And I had no idea what to do. But at this point, I was a little obsessed with figuring out exactly what the technical problem even was (and not actually solving it, which I still considered beyond my ability). (Note: this is the second tech article included in the OP.)
I started reading the ROMhacking boards and Scarboy pointed me in the right direction. I tried a debugging emulator out and even a little savestate hacking. A few dead-ends and days later, I figured out how to use the debugger, and where to look in memory. I found the codes for kana (somehow) and since I could at least read that, I figured out how the game drew a's and ka's and ki's, and eventually realized, A-HA! The game was storing its text "sorta backwards." Konami had encoded its game script by basically "reversing the numbers" which comprised the characters.
(This is something on the order of switching A with Z, B with Y, C with X, etc. Again, the technical document goes more into it)
I manually wrote out "Hello marc" (and screwed up the punctuation.)
A-HA! Marc retorted. But the problem was when the length of the English text went over the length of the Japanese - not when they were the same size...
Shit. Unfortunately for me, this "ping-pong" thing scratched me right where I itch. Now I wasn't so obsessed with identifying the technical problem, as I was with showing up Marc.
I had noticed that Policenauts stored all its English characters pre-pended with 80. So what if, instead of storing those 80's in the data like it was doing, you automatically wrote those in yourself? (And that's the third technical doc.) What I wanted was a real-deal assembly programming hack. Since I knew next to nothing about that, I did some more research and found the routine where Policenauts was reading the text, and putting it in memory to draw onscreen.
I noticed there was a nop - short for "no-op". It was basically a blank-spot in the middle of a three-line routine. And what I wanted to do was a modification to that routine which required four lines.
In other words, there was a blank spot exactly where I needed it.
It didn't work at first, because I didn't understand how to actually insert the binary opcodes (meaning the binary instructions) into the savestate, but a goon named Rin noticed my dilemma (I posted on RHDN) and asked his ROMhacking friend to help. I'd screwed up the Endianness - basically, I'd not written the instructions in the correct order.
Once I fixed it, my double text hack worked. I put a video together of an English sentence that fit in a space much larger than the original.
"Wow," Scarboy said to me in AIM. "So you really got that nop for free?"
"Yep!" I proudly replied, and even posted about it in my Let's Play thread. "Theoretically," I wrote. "This would work in a real-deal Playstation, not just emulators."
Ping.
I waited for Marc's response.
And he was impressed. He had no counter. He would run it by his old programmer.
At this point, I was getting cocky. I now sort of understood the problem - the data for the game script was all packed together; but more importantly, it was indexed - the game had a specific table of contents, telling it where each line began. Inserting larger strings of text would push succeeding lines out, and the indices would be all out of whack. It mean that that if the game had 2 sentences:
(a) Sentence number one. (b) Sentence number two.
And it knew where (a) and (b) were based on file positions - (a) being the 1st character and (b) being the 25th, replacing the longer sentences:
(a) With something much l(b)arger like this. Sentence number two.
Would screw up the second sentence. (If this example looks weird, the notion is that "(b)" didn't move, when the sentence got longer. So when the game prints it out, it would print "arger like this. Sentence number two.")
At that point, I told Marc that I held a B.S. in Computer Science, and though I'd never done any sort of ROMhacking, I I'd been a Java programmer for close to ten years (along with other assorted languages along the way) and I was willing to attempt the ROMhack.
He talked it over with Artemio. Artemio set up a Subversion code repository and a private forum on Junker HQ. He contacted Meow, the old programmer, and they all agreed to give it another go.
Meow took the wind out of my sails pretty quickly; although even before we got a hold of him, it started to dawn on me what the real problem with ROMhacking Policenauts was.
Firstly, Meow noted, the doubletext hack I'd discovered shouldn't have been necessary - Policenauts could actually display English with single-byte ASCII regardless. English already only cost you a single byte as opposed to Japanese double-byte characters, right out of the box. So my cool hack might've impressed Marc and gotten me on board the project, but it was ultimately not helpful.
Even worse, as my own research was starting to show: the text pointers were hidden in a compiled bytecode script.
More or less, that meant this. Someone at Konami (possibly even Kojima or one of his employees), wrote the script as a small computer program. Something simple, like:
10 IF PLAYER_CLICKS("LAMP") AND NUMBER_OF_TIMES_LAMP_CLICKED == 0, PRINT SENTENCE 7, NUMBER_OF_TIMES_LAMP_CLICKED++
20 ELSE IF PLAYER_CLICKS("LAMP") AND NUMBER_OF_TIMES_LAMP_CLICKED >0, PRINT SENTENCE 24
30 IF PLAYER_CLICKS("GUN") AND PICTURE_OF_ED_CLICKED == true PRINT SENTENCE 20
40 IF PLAYER_CLICKS("GUN") AND PICTURE_OF_POLICENAUTS_CLICKED == true PRINT SENTENCE 20
(Note: We never had a copy of the original scripting language, so this is just a made-up example.)
Of course, that's pretty verbose and not necessarily easy for a computer to parse, so it's compiled and the game has an on-board interpreter to correspond these script events to game code. Events like "player_clicks" or the game's logical "objects" (like the lamp, or the bounding box surrounding the lamp sprite) would be given their own numbers. The scripts logical commands like "if" or "else" would have their own numbers assigned.
In other words, I was looking for the text pointers - in my example, numbers like 7, 24, 20. But there would be other numbers that may collide with other numbers that were not text pointers. The number 20 could also be what "if" compiled down to. In this case, 20 is repeated 3 times, but it's only a text pointer twice. Even though I knew where all the text was, I didn't know how the game knew to find it.
So I had a problem. We had no systematic way of finding - and thereby calculating - new values for and editing the text pointers. The closest Meow had gotten was to leave everything where it was and try somehow to cram the English in. But no dice.
That doesn't work because of something I called The White Blood problem.
In the alleyway, in the Prologue, Jonathan notes a man he's chasing has white blood. In Japanese, this is a three character sentence: (Shiro)(i) (chi). In English, that's a ten-character sentence not counting punctuation ("White blood..."). With punctuation and spaces, I essentially had 6 bytes to work with in Japanese (3 characters times 2 bytes), but 11 bytes in English - and this was already accounting for a "doubletext hack"! I'd need a really good compression scheme.
Even then, what if there were worse examples? What if you could fit an English sentence into one Japanese Kanji character? Plus a decompressor like that might take up space in memory of its own, or even slow down the game.
It was maddening.
Nothing I thought of worked, and the only options left were to what? Reverse engineer Konami's bytecode script? "Nearly impossible," commented a computer engineer I knew from SA. I was stuck here.
The answer came in a "Eureka" moment. In a Service Station on the New Jersey Turnpike, actually. My brother and I were on our way to Atlantic City for my 30th birthday. The problem was killing me, and I couldn't get my mind off it. I went to use the bathroom. All day, I was turning over stupid solutions in my head - mostly about compression and silly places I could put the text.
"What about putting the second half of all the sentences somewhere else? What about base64 encoding things? What about-"
"Look," I thought firmly (at the urinal). "You need new text pointers. You cannot get around this."
"Okay, well, if I'm not able to get the old text pointers, what can I do? Make wholly new text pointers? And even if I did, it doesn't matter because the game only knows about the old ones. And all I know is where they're pointing to-"
And really just like that - on my way from the urinal to the sink - it hit me.
All I knew was where the old pointers were pointing. So what if I put a new text pointer there? I had a vision in my head. It became something like converting:
(a) Sentence number one. (b) Sentence number two.
To:
(a) With something much l(b)arger like this. Sentence number two.
Just like before. Only now, I'd insert my own pointers right in the text.
(a->a) With something muc(b->c)h larger like this. [c] Sentence number two.
Yeah, but you screw up the first sentence, because there's a new text pointer now... ...or no, wait. What if you hijack the text reading routine to skip over the new pointers? Like put in a special code that says "text pointer, jump ahead 3 bytes."
I did a quick mental review. Yeah, wait, why wouldn't that work?
So after washing my hands, while my brother drove, I got on my Treo 650 and typed out the solution to Meow. This required writing it in the phone's notebook and copy/pasting it into the browser since it tended to stop typing long textarea replies. I expected a ping-pong session, like with Marc, where he'd shoot down the idea.
He didn't. He liked the idea a lot, actually.
Holy shit, it could work.
I ended up naming this hack the "DATCH" for "Double Addressing with Text-Chaining Hack" It follows the game's original text pointer, and finds my new text pointer that's actually embedded into the text. Since these new pointers "break up" sentences, my hack was smart enough to "jump over" them if it wasn't using them for readdressing.
The Policenauts Translation Project was under way!
Then about a week later, I considered quitting.