Part 35: Disassembling (Part 2)Let's Disassemble the Thread Code, Part 2
With a rudimentary understanding of assembly & disassembly under our belt, we can tackle Project Atrocity, an even longer and more complex piece of machine language.
This one presented a slight difficulty for our disassembler, as it loads itself into location $0800/2048. Recall that this is the start of BASIC programs. That means we need to relocate Atrocity in order to run the disassembler. To keep things consistent with the other programs, I loaded it from disk and moved it up to location 49152 (using PEEKs and POKEs).
We'll look at it in segments, as it's somewhat longish.
start address(decimal) ? 49152 (hex=c000) 49152 brk 49153 ? 12 49154 php 49155 asl 49156 brk 49157 ? 158 49158 jsr 12338 49161 rol 50 ,x 49163 brk 49164 brk 49165 brk 49166 ? 243 49167 brk
Already you're probably wondering ... why does this supposed machine language look like it has garbage at the start? This is entered correctly and copied exactly; there's nothing wrong with it. The thing is, that's not machine language.
What this is is a line of BASIC code. Trying to read BASIC code as if it were machine language yields nonsense. So even though it was entered using MLX, it's a line of BASIC. The only line in the program is 'SYS 2062' which calls the actual machine language. (I didn't get that by using the disassembler, I got that by loading the program and typing 'LIST'.) This was alluded to much earlier in the thread; just because something is entered using MLX, that doesn't mean it is necessarily in machine language.
The actual assembly code begins here (which we could also determine by adding our offset to 2062, where the original code started execution):
49168 sta 54274 49171 sta 54277 49174 lda # 1 49176 sta 54275 49179 lda # 15 49181 sta 54296 49184 lda # 64 49186 sta 54273 49189 lda # 128 49191 sta 54278 49194 sta 54290 49197 lda # 255 49199 sta 54286 49202 sta 54287 49205 sta 54272
49208 lda # 147 ; clear screen character 49210 jsr 65490 ; output char (to channel, default is screen)
49213 lda # 0 ; stash a few values for easy access 49215 sta 251 ; in locations 251-254 49217 lda # 4 49219 sta 252 49221 lda # 0 49223 sta 253 49225 lda # 216 49227 sta 254 49229 ldy # 0
49231 lda 54299 ; random value from OSC3 49234 sta 54273 ; play sounds? 49237 lda # 65 49239 sta 54276
I'm not totally certain, but I'm reasonably sure the statements after the random value is grabbed are playing the actual sounds, using another voice.
49242 jsr 65508 ; get keypress (KERNAL) 49245 bne 49310
49247 lda 54299 49250 and # 15 49252 sta 53280
49255 sta ( 253 ),y 49257 lda # 214 49259 sta ( 251 ),y
This is the thing that required zero-page addresses. The value that gets looked up will actually be a 16-bit address (often called a 'word address' in C64 context). So location 253 and 251 are storing 16 bit addresses, and we're putting a random value (0-15) in one and the value 214 in the other. If you look back up at the initialization for those registers, we see that 251-252 has $0400, and that's the start of video memory. The other register is $D800/55296, which I'm pretty sure is color memory (in a character-sized chunk). So we're writing a character to screen, with a random color for that bit of screen. As it happens, the Y register is 0, so that does not affect the result. (If Y did have a value, it would be added to $0400 or $D800).
49261 inc 251 49263 bne 49267 49265 inc 252 49267 inc 253 49269 bne 49273 49271 inc 254
49273 lda # 232 49275 cmp 251 49277 bne 49242 49279 lda # 7 49281 cmp 252 49283 bne 49242
49285 lda # 0 49287 sta 251 49289 lda # 4 49291 sta 252 49293 lda # 0 49295 sta 253 49297 lda # 216 49299 sta 254
49301 lda 54299 49304 sta 54273 49307 jmp---> 2138
Of course you may have noticed that all the branch statements apparently did update to the new location. How did that happen? As it turns out, branches are relative. In fact, they don't store their value as 'BNE 49152', they store it as 'BNE +24' or 'BNE -13'. It's an offset from the current address in the PC. Jumps, on the other hand, are absolute. They store the full address of the destination. Consequently they take more space and execute slightly slower.
Another consequence of the relative nature of branches is that they are limited in size. You can't branch a long distance away from your current location. When writing assembly, this is something you either have to look out for (and use a branch to a jump, if necessary) or have it handled automatically by a good assembler. And like many things in assembly language, this is not the only approach. This distinction between jump and branch is a hallmark of the Motorola and similar instruction sets.
49310 lda # 64 49312 sta 54276 49315 lda # 147 49317 jsr 65490 49320 rts --------------------
This routine uses RTS to exit, which ensures that we aren't relying on empty space following us, and also allows this to be called as a subroutine (say, if it were a loading screen or something like that). The disassembler actually adds the dashed line, to make it clearer when a subroutine has finished.
Prior to exiting, we turn off the sound (I'm presuming) and output a character that clears the screen, using the same KERNAL routine as at the start of this little program.
So, now that we know what's going on in this code, why not do some hacking?
Obviously we could read this in, modify it, and create our own version using an assembler but I'd like to mess with it in-memory.
One easy thing we can do is fix that absolute JMP to make it work with our relocation. If you recall that the first byte of an instruction is the opcode, the following bytes are the operands. So all we need to do is change the operand. The JMP instruction is at 49307, and that means the operand starts at 49308. The only hitch is that we can't poke 16-bit values; we need to put the desired operand in byte form. The desired target is 49242. Divided by 256, we get 192 with a remainder of 90. (Alternately, we can use hexadecimal: 49242 = $C05A, $C0 = 192 and $5A = 90).
Therefore, we can get this working in the relocated spot by using these statements:
poke 49308, 90:poke 49309, 192 sys 49168
Here's something a bit more complex. That random number generator works by using the 'noise' setting of the oscillator. But what if that oscillator is set to do something else?
Referring to a memory map, the location that controls the oscillator is 54290. This is oscillator 3 (the only one that the 'random' sampler works with). That gets set in 'line' 49194 above. It's set from the accumulator using the #128 that was loaded a few lines prior.
So we'd like to change that, but without necessarily disrupting the value stored in 54278. We can't simply insert a line, but by looking at the code, a solution presents itself. The initial value in 54273 gets changed to something random just a short while later. We can probably get away with using something else there.
So, we'll shift things a line up, change the initialization value for 54273 and 54278, and then put a new value to save in 54290.
To make things easier to test, I did this with the code in its original location. So 49194=2090, and the other values precede it by a few lines.
for i=2085to2087:pokei,peek(i+2):next i poke 2081,128 poke 2088,169 poke 2089,32
This is the finished product (adjusted for the disassembler):
49181 sta 54296 49184 lda # 128 49186 sta 54273 49189 sta 54278 49192 lda # 32 49194 sta 54290
One other thing that isn't really a hack, but an observation on how to tweak this program. I'm not going to do it with pokes & peeks, as implementing it would require altering the code too much, and the effect might not even be observable.
Because we're using indirect indexed addressing for the video memory, the loop could execute faster by incrementing the Y register instead of locations 251 & 253. This is because to increment a value in memory, the processor actually has to read memory, modify the value, and store it back when done. By using registers those steps aren't required. Now this does make things feel 'ugly'. We're no longer doing a simple step through the values, and the actual address used, instead of being stored in 251-252, is a combination with the Y register, and kind-of shared with the color memory's address.
It's a pointless optimization here, but it's an example of the sort of thing an assembly language programmer might do. Of course there are circumstances where they might also need those extra cycles; for all I know changing this to use the Y register as an actual index might end up making it appear very differently. One of the primary advantages of programming in assembly is that you can calculate precisely how long each segment takes to execute*.
*This is less true of modern processors, which can optimize their code on-the-fly for increased performance.
As an interesting additional note on the disassembler. In the recent battle over the security of a cell phone between Apple and the US FBI, the topic of disassemblers came up. Noted barely-sane person John McAfee made the claim that one could apply a disassembler to Apple's code, and simply 'bypass' the check on the passcode to unlock the phone. While this exemplifies a deep misunderstanding of the particular issue, I hope it can be seen that it's not implausible for some cases. Some of the anti-piracy techniques of the time could quite easily be defeated by inserting a judicious JMP or a bit setting instruction.