Part 48: Tales of ROMhacking: Part 7
Tales of ROMhacking"Well," Marc asked me one night in November. "Want to become the first person to finish Policenauts in English?"
"Sure."
And that's how it went. He gave me the Act 7 script, I incorporated the VOX and the movie files, completed the game and - BONK. You know that sound Windows Vista makes when something crashes?
It turns out that pSX - my emulator of choice at the time - hard crashed in the end credits of Policenauts. I've previously talked about this in the thread, and since we never figured out what it was, I won't dwell on it, save that it was a frustrating footnote in the "alpha" stage of the game.
While "beta" is a useful software designation, insofar as you are "feature complete" (i.e. all the bells and whistles and extra features you want are in place, if not completely bug-free), well, "alpha" is a little more nebulous. For Policenauts, I defined alpha as "playable in English". It meant the gameplay scripts, the VOX, and the movies were in English. I didn't count one-offs like the motorcycle scene or the shooting trainer.
But when we were done with "alpha", I comprised a list of milestones to get to before we could call beta.
There were eighty-six.
More, really.
I knocked some down, and Marc put some up. You've been reading about them in the course of the thread. To be fair, I didn't lump anything together (the telops comprised a good 20-30 goals, individually), but I'd say near the end, we had translated or fixed nearly one hundred things. The only things that stuck out to me that I said no to were the video compression, writing a new separate installer for the patch, and ultimately, I had to give up on fixing the end credits for pSX - it would've required me redoing it in assembly from scratch, and it would've taken months.
When we got the list close to zero, months later and with tons of decompressor tricks, modifications, etc. I was happy and wanted to tell everyone.
But Artemio and Marc vetoed it, and they were right to do so. In fact, during this time, I was hanging out on RHDN and saw a Chrono Trigger fan sequel nearing completion. Apparently, the team put two years of work into it and one month before release, Square-Enix sent them a cease-and-desist.
Now, you can imagine that I'm not chomping at the bit for "Chrono Trigger 2: As Written By Some Dude" but I can certainly appreciate the amount of work put into it, and apparently, the fan project was actually turning out quite good. And though they're within their right, for S-E to send them a C&D, one month out from release? C'mon. They had two years to send that letter.
So we kept our mouths shut nearing the end of beta, with the plan to just release one day with little to no fanfare.
But when we got near there, with my list of features dropped to zero, Marc asked me for one last change. He wanted me to change a graphic, he said.
Okay fine. One more graphic. No huge deal. Graphics were a pain in the ass, because I had to manually set breakpoints, grab the original out of VRAM, modify in GIMP, recompress with LZO, overwrite the data in DATA.DPK, etc. etc. etc. But it was just one.
Then he sent me the e-mail. He wanted me to modify the Engrishy T-Gear99 sequence, in Act 2, when the computer wants to authenticate you (crests puzzle).
Except that was actually TEN graphics, since that was a whole sequence.
It was nine more over my limit.
In terms of software development, we were doing what's called an "ad-hoc" management process. There was no deadline, and no clear end goal. I was now getting worried: Marc and Artemio could flounder with changes constantly, if they wanted to. If that sounds paranoid, I'd seen it before and it's way, way too common in business projects as well. There's two unofficial terms for it:
- Development Hell: When your developers are working ceaselessly to change features back and forth.
That's overstating things a bit for the ROMhack, but it is really disheartening to have your feature list get into single digits, have the end in sight, and then have the goalposts moved out from under you. Marc and Artemio had been working on Policenauts for years. What was a couple more months to them?
The other term is:
- Death March: This is when the developers realize that, oh no, there really is no good ending in sight: These changes are going to come endlessly until the thing is cancelled or we're all laid off and they outsource the rest.
I've been on death marches. They are not fun. They're actually a little soul crushing.
I told Marc that I was really unhappy with doing ten more changes out of nowhere, and that we'd fixed so many things. We'd put our due diligence in, and it was time to really discuss an end to the project.
Marc had another suggestion: Take two weeks off. In all honesty, I had become a bit stubborn about working on Policenauts, so maybe he was right. Maybe I just needed a breather.
But...
Well, fuck it. I'd take two weeks, and THEN talk about the end. Fine. Mini-vacation.
I never got it, though.
Two days later, Artemio had some bad news. He'd burned the ISO to a CD-R and tried to play the ROMhack in his modded PSX.
It didn't work.
The game started and the woman's voice said "Taihen nagaraku..." (Thank you for your patience.) And then it looped and said it again. And again. And again.
Oh.
Okay, I said. It was probably the way we were building it or something. "Well, that's the VOX file. Can you try just using the Japanese one and seeing if the rest is okay?" I asked, making a mental note to talk to Scarboy about his VOX hack.
Alright, he did it.
Then the first movie (Jonathan talking about Beyond Coast) was messed up. The audio was skipping all over the place for some reason.
Huh.
Okay, well, that was Meow's old PolicePatch program that Artemio and I modded. ...That was concerning.
It got worse. After that movie, the game froze.
Oh no.
That was the "KONAMI presents" graphic. We'd modified it for a line break. That meant... the LZO decompressor was failing on hardware. Okay. Okay, this was bad.
Well, what about the DATCH? That was the gameplay hack I'd written. That had to work, right? I mean, it was fairly simple assembly. That didn't have enough moving parts to break on hardware, so it must work, surely?
It did not. Menus were jumbled and the game froze on selecting one.
Everything worked perfectly in emulators and failed catastrophically on hardware. That was really bad. I could guarantee the game worked in pSX 1.13 and ePSXe 1.7 - what about future versions of them? What if their hardware emulation got better and broke the hack?
We were eight months in. What could we have been doing wrong?
My heart sank with the worse news: Artemio was able to take the component Japanese files and rebuild the original game with no problems - so it wasn't the way we were recomposing the game with the hack. All major parts of the hack failed miserably. The whole point of the ROMhack was to make it work on hardware, so we could "guarantee" it would work in the emulator. But now the whole project was in danger of not being able to accomplish that.
We were dead in the water, and we didn't know why.
"There is no way," I posted. "That Scarboy, Meow, Artemio, and I independently missed separate things in the hack. It has to be one stupid thing."
But it didn't seem to be. And as the days went on, I started to realize I had no idea why things were fine in the emulator, but breaking in hardware.
"Look at it realistically," my boss at work said. "Your userbase is mostly going to be people on the emulator, right? So you'll stagger release. Have one on the emulator earlier, and then figure out the hardware version later."
Yeah, but... after working so hard for so long, I didn't want the end product to be ... well, shoddy. The end credits, I just couldn't do anything about to begin with. Saying that pSX could not correctly emulate the game was one thing, but now to tell people they couldn't play it on their PSPs and PS1s?
About two weeks in, I sighed and decided to start from square one. I got a clean copy from my physical Japanese Policenauts discs and decided to start with the first line of the game. I fortunately had a modded PSP that I'd never touched for some reason, and Marc mailed me his modded PS1. Artemio found a Playstation emulator that claimed 100% emulation (at the cost of emulating so slowly, as to be unplayable.) Known as XEBRA, its instructions were in Japanese, and its file timestamps were dated... 1995?!
I had the feeling that XEBRA was a hacked/leaked part of the actual Sony Playstation dev kit. It was slow, but it replicated all our bugs.
So now to do this hack properly, the way I should've been doing it from the beginning - by deploying to hardware every so often. I started with comparing the output of manually putting in a line of the VOX file ("Thank you for your patience") wihich seemed to work, and the output of Scarboy's script, which seemed not to.
The problem was actually pretty apparent then. The emulators were more forgiving of byte alignment, or lack thereof. It seemed like the original game's audio data was always byte-aligned - meaning all relevant data came in chunks of 4, and began at byte locations 0, 4, 8, or C.
Scarboy's script was more or less removing trailing zeroes. On hardware, this had the side effect of making the audio processing get "off-track" and play garbage. Basically, by not making sure our output data was byte aligned (bytes outputted in a multiple of four), it meant data was being grabbed in incorrect chunks. Turns out, this was happening in the MOVs, too.
So I modified Scarboy's and Meow's scripts to add 1-3 blank spaces after each subtitle in order to byte-align it and make sure the audio data around the subtitles would line up.
It fixed those things, but the LZO decompressor was still broken, as was the DATCH.
I didn't want to touch the decompressor - that was produced from a compiler. And it was fucking decompression! What chance did I have at fixing it?
So I decided not to think about it, and moved onto the DATCH.
I started really small. Whenever you loaded a savegame, the game displayed:
"View Summary Screen?"
This time byte-alignment couldn't have been the issue (though I tried anyway) as the original game did not use byte aligned data for the text. Hmmm. So I tried a very stripped down version of my hack that said: "If you encounter a 6e, skip bytes until you encounter another." I did use something like this in the DATCH to skip over filenames that the game had randomly shoved into the text sector for some reason.
So now, my input data was:
V[6e]i[6e]ew Summary Screen?
And my code was basically:
lb r2, 0x0000(r4) # Read the next byte into register 2
beq r2, 0x6e, 0x0010 # Jump ahead 10h bytes if the read byte was 6e
(Note: That is not proper assembly, I'd actually have to load 6e into another register with addiu and compare those.)
I should've gotten: "Vew Summary Screeen?"
Instead I got: "Ven?"
It made no sense to me. Even a code snippet that small did not work how I thought it did in hardware. But I was following the opcode docs, there was barely any logic to be done... what the hell could I even possibly have been doing wrong? It's like it wasn't even reading the right byte except much later than it should have?
Scarboy posted:
So
slowbeef posted:
Theoretically
I futzed around for a bit, trying things randomly, and out of desperation, tried the following:
lb r2, 0x0000(r4) # Read the next byte into register 2
lb r2, 0x0000(r4) # Read the next byte into register 2 (again?????)
beq r2, 0x6e, 0x0010 # Jump ahead 10h bytes if the read byte was 6e
I read the same memory two times into the same variable. It was clumsy and weird. It was the programming equivalent of:
x = 5;
x = 5;
It made no sense. A fool's errand. A drunk's thoughts. Even in retrospect, I'm not entirely sure why I tried it. It could never, ever work.
Except it did.
Why??????????????????????????
Qualitatively knowing it wasn't enough. I didn't have a shot at the decompressor until I understood why reading memory twice worked for the DATCH.
Scarboy posted:
So you really
slowbeef posted:
Theoretically, this would
It couldn't have been reading it twice, really, I figured. I had stumbled on the solution and maybe it had to do with... ...timing? It was one of those things where the answer was floating around in my mind until...
Could it be...?
I remembered this one lecture I had in Computer Architecture, and I suddenly realized I was wrong about the doubletext hack - the very first thing that had gotten me started on this project; the hack that put longer English sentences in the smaller Japanese spacing.
There had happened to be just enough room for me to insert my code, because of a single nop.
Scarboy posted:
So you really got that nop for free?
I hadn't.
slowbeef posted:
Theoretically, this would work in a Playstation, not just emulators.
It wouldn't.
My original hack would never have worked.
I thought it over some more, asked Proteus and Null Set about it. They didn't have enough background in the hack for a really in-depth diagnosis of my specific problem, but agreed that my theory seemed sound.
I went to the Junker HQ boards.
"I think the problem is pipelining."
Pipelining: What All The Emulators Get Wrong
If you stop and think about it, at its base, maybe writing an emulator isn't so tough. You have opcodes that are to be run by the original processor, like "addiu" and "lb" and such. So what if you just have 32 variables for your registers, a big-ass array for memory, and an interpreter for the opcodes?
No sweat, right?
Did you ever notice though, especially with programs that emulate modern consoles, you need much better hardware than the original? I mean, a Playstation 1 clocks in at about 34 MHz, but it seems like you need a computer with about 1 GHz to run it with a normal framerate?
There's a lot of reasons for this: The PS1 is dedicated to the game, whereas your operating system is managing quite a few programs. And while the PS1 had a lot of dedicated hardware that it would directly pipe audio and graphics to, you've got maybe an extra step or two while your emulator marshalls that data out.
But here's an interesting thing about computer processors: While their instructions are serial - that is, one must come after another - they can run different stages of instructions simultaneously. This is known as pipelining.
Here's what I mean:
The PS1 runs a MIPS processor. The MIPS has a 5-stage pipeline: In order to run a computer instruction, it must complete five-steps, designated by the very easy to remember initialism: FIEMW. This stands for:
1. Fetch instructions
2. Instruction decode
3. Execute
4. Memory read
5. memory Write-back
Suppose two simple instructions: Let's say just adding some registers.
A. add r2, r0, r2 # This is r2 = r2 + 0
B. add r3, r4, r2 # This is r3 = r2 + r4
What MIPS does is this:
First, it Fetches the instructions for A. Done. Next, it Instruction decodes A. But here's the kicker: While it's decoding A, it can start Fetching instructions for B. If a third opcode (C) comes along, it'll execute A, while decoding B, while fetching C. Then it can read memory for A, execute B, decode C, and fetch for D when D comes along. A basic timeline looks like:
code:
FIEMW
FIEMW
FIEMW
FIEMW
FIEMW
Okay, great, so what's the problem?
Well with adds, and other mathematical operations, there is no problem.
But notice that memory access happens AFTER execution. So here's the rub. My code was reading memory, and branching depending on whether or not I read a control code. Like so:
lb r2, 0x0000(r4) # Read the next byte into register 2
beq r2, 0x6e, 0x0010 # Jump ahead 10h bytes if the read byte was 6e
Okay, so that means:
1.
lb: Fetch instruction
beq: hasn't started
2.
lb: Instruction decode
beq: Fetch instruction
3.
lb: Execute
beq: Instruction decode
4.
lb: Memory Access
beq: Execute
Now, when lb "executes" keep in mind that the value in memory hasn't actually been loading into the register. That's because for a memory operation, it happens in the 4th step, the memory access.
But memory access happens at the same time the branch executes. The branch depends on the memory read. So what happens?
The Playstation's MIPS processor's answer is that the branch happens first.
So I'm testing on a memory read that hadn't happened yet. My control codes were all read late, so I was getting nonsense values. My double-addressing didn't work, etc.
Well, if pipelining can cause this sort of problem, how do compilers - which are supposed to take our human readable code and turn it into assembly - solve it? One of two ways:
1. Re-arrange the code so that dependent instructions like that are not next to each other.
2. If that's not possible, you have to space them out with nops.
That's why there was a no-op available in my original doubletext hack. There was a branch dependent on a memory read, so the compiler put it there. I ignorantly assumed this was just a boon, or an erroneous thing, so I took advantage to insert my doubletext hack.
Since emulators don't pipeline, there was no issue.
Had I tried it in hardware, I'd have seen that it wouldn't have worked.
So I rewrote the DATCH hack to put nops in after memory reads.
And it fixed that problem.
Last problem: The LZO decompressor. It turned out Scarboy's compiler (which I believe is SPIM) had the same problem with my hand-written assembly. The compiler actually spat out instructions for one processor later than the MIPS 3000: It included opcodes like "Branch likely" - compiler optimizations not present.
That was okay, Scarboy turned that off and had his compiler spit out MIPS 3000-compatible instructions only. The problem was, it didn't do nops.
Now this was way tricky to fix. He couldn't figure out how to do it in the compiler, and the compiler spit out branch conditionals. This is a problem: Branches work relatively, where as jumps work absolutely. To fix the memory reads, we needed nops, but...
A jump instruction will tell the processor: "Hey jump to this location in memory and execute instructions there."
Branches say: "Jump 4 lines ahead or six lines behind."
But shit. Adding in nops manually would screw up those branches. Something which said "Jump 4 lines ahead?" Adding a nop meant I'd have to manually change that branch to five lines. And even worse, I might later encounter a branch that said "Jump 36 lines behind" and I'd have to change that to 37 lines after the fact.
And keep in mind, I'd have to add the nops after every memory read, meaning many different branches were changing depending on whether or not they were now branching over nops.
Fuck.
It was way too involved for me to do manually and it wasn't easy to code around.
So I did something crazy, my excuse being: "Hey, it's a ROM 'hack' after all."
...
...
I wrote a different script. I took the LZO decompressor instructions and added a nop after every single line. Why? Because then I could just double all the branch offsets. Think about it. It works because you're literally doubling your code size.
You're right if you've thought about it and realized it makes LZO decompression in Policenauts roughly twice as slow as it should be. But since this only happened during gameplay (not, for example, the movie sections), well, it never seemed to affect the actual game output or performance.
I wasn't done because MIPS had one last little fuck-you to me.
The MIPS Jump Delay
This is a pain-in-the-ass thing that gets into computer architecture stuff. It's a gotcha for handwritten assembly.
MIPS, as a rule, will always execute the instruction after a jump or a branch, regardless of whether the branch succeeds or fails This means if you have a block of code like:
Line 1
Line 2
Line 3
Jump if something is true/false
Line 4
Line 4 will always be executed. This means your code should really look like:
Line 1
Line 2
Jump if something is true/false
Line 3
Line 4
Basically, you switch the the jump with the line before it. If the instruction right after must get executed, then you want to make sure it's one you actually intended to have executed! (Of course, you can just be safe and use the nop).
My script had to account for this, but it ended up fixing the decompressor.
So our bug count to beta shot up again as I fixed real-deal hardware issues (and yes, Marc's ten graphical changes) and we were on our way to beta. Again.