Part 35: Disassembling (Part 2)Let's Disassemble the Thread Code, Part 2
With a rudimentary understanding of assembly & disassembly under our belt, we can tackle Project Atrocity, an even longer and more complex piece of machine language.
This one presented a slight difficulty for our disassembler, as it loads itself into location $0800/2048. Recall that this is the start of BASIC programs. That means we need to relocate Atrocity in order to run the disassembler. To keep things consistent with the other programs, I loaded it from disk and moved it up to location 49152 (using PEEKs and POKEs).
We'll look at it in segments, as it's somewhat longish.
start address(decimal) ? 49152 (hex=c000) 49152 brk 49153 ? 12 49154 php 49155 asl 49156 brk 49157 ? 158 49158 jsr 12338 49161 rol 50 ,x 49163 brk 49164 brk 49165 brk 49166 ? 243 49167 brk
Already you're probably wondering ... why does this supposed machine language look like it has garbage at the start? This is entered correctly and copied exactly; there's nothing wrong with it. The thing is, that's not machine language.
What this is is a line of BASIC code. Trying to read BASIC code as if it were machine language yields nonsense. So even though it was entered using MLX, it's a line of BASIC. The only line in the program is 'SYS 2062' which calls the actual machine language. (I didn't get that by using the disassembler, I got that by loading the program and typing 'LIST'.) This was alluded to much earlier in the thread; just because something is entered using MLX, that doesn't mean it is necessarily in machine language.
The actual assembly code begins here (which we could also determine by adding our offset to 2062, where the original code started execution):
I won't go into the details of each statement, but we have a lot of LDA (Load the Accumulator) and STA (Store the Accumulator) into locations which are associated with sound. This is all basically set-up to be able to make the noises. The desired values are placed into the accumulator, then stored into memory.
49168 sta 54274 49171 sta 54277 49174 lda # 1 49176 sta 54275 49179 lda # 15 49181 sta 54296 49184 lda # 64 49186 sta 54273 49189 lda # 128 49191 sta 54278 49194 sta 54290 49197 lda # 255 49199 sta 54286 49202 sta 54287 49205 sta 54272
Here we see a call to a KERNAL routine, one that outputs a character. This is the 'clear screen' control character; outputting it (even in machine language) will clear the screen.
49208 lda # 147 ; clear screen character 49210 jsr 65490 ; output char (to channel, default is screen)
Previously I mentioned that you need two 8-bit registers to form a 16-bit address. But if you stick with only 8-bit addresses, you can save time, space, and even do a few things that aren't allowed when using the whole address space. There are just a few free memory locations available for programmers to use in this part of memory (also known as 'zero-page memory'), and these are those locations. The values being stashed here will be used a bit later.
49213 lda # 0 ; stash a few values for easy access 49215 sta 251 ; in locations 251-254 49217 lda # 4 49219 sta 252 49221 lda # 0 49223 sta 253 49225 lda # 216 49227 sta 254 49229 ldy # 0
The first statement here is important and thanks to the technique being mentioned I know what it does. It's loading a random value from one of the sound oscillators. Instead of playing the sound, it can be used as a random number generator when the waveform is set to 'noise'. This is actually a lot faster (and better randomness) than using the built-in random function.
49231 lda 54299 ; random value from OSC3 49234 sta 54273 ; play sounds? 49237 lda # 65 49239 sta 54276
I'm not totally certain, but I'm reasonably sure the statements after the random value is grabbed are playing the actual sounds, using another voice.
In OMG GOON RUSH this same KERNAL routine was used to check if a key was pressed. This time it's a 'BNE', or 'branch if not equal', meaning that we only jump out when the key was pressed, and continue straight on otherwise.
49242 jsr 65508 ; get keypress (KERNAL) 49245 bne 49310
We grab another random number, and use it to set the screen border color. The 'AND' statement masks off the values to restrict it to the range 0-15. It's one of those arithmetic/logical operations that only the Accumulator can do.
49247 lda 54299 49250 and # 15 49252 sta 53280
Here we have the first use of a parenthesized address. That means indirect mode, and since it's marked with the Y Register afterward, it does something more with it. Indirect mode means that the value at the address we're providing is the actual address where we want to store our value (if you know what a pointer is, think of that). The addition of an index register means the index register is an offset. This particular mode is only usable with the Y register, and it means the value in Y is added to the stored address as an offset; this mode is known as 'indirect indexed'.
49255 sta ( 253 ),y 49257 lda # 214 49259 sta ( 251 ),y
This is the thing that required zero-page addresses. The value that gets looked up will actually be a 16-bit address (often called a 'word address' in C64 context). So location 253 and 251 are storing 16 bit addresses, and we're putting a random value (0-15) in one and the value 214 in the other. If you look back up at the initialization for those registers, we see that 251-252 has $0400, and that's the start of video memory. The other register is $D800/55296, which I'm pretty sure is color memory (in a character-sized chunk). So we're writing a character to screen, with a random color for that bit of screen. As it happens, the Y register is 0, so that does not affect the result. (If Y did have a value, it would be added to $0400 or $D800).
Here we are doing some 'INC' of those addresses, because we want to cycle through the whole of video memory. The branch statements are used for carries (the high byte is incremented only if the low byte rolled over to 0).
49261 inc 251 49263 bne 49267 49265 inc 252 49267 inc 253 49269 bne 49273 49271 inc 254
Next we check against a value using 'CMP', or compare, to see if our loop is done. If not, we branch back until the screen is filled with random colors and our desired character. The fact that this is all being done in machine language, which is to say very quickly, is what makes it look a bit twitchy as the screen refreshes around it.
49273 lda # 232 49275 cmp 251 49277 bne 49242 49279 lda # 7 49281 cmp 252 49283 bne 49242
When the loop is done, we need to reinitialize those addresses. Then we can go around and do the whole thing again.
49285 lda # 0 49287 sta 251 49289 lda # 4 49291 sta 252 49293 lda # 0 49295 sta 253 49297 lda # 216 49299 sta 254
And here as we get set to restart our loop, we notice a problem. This JMP instruction is still saying to jump down to 2138, the original address! It didn't change just because I moved the code around. Adjusted for the new location, it should be 49242.
49301 lda 54299 49304 sta 54273 49307 jmp---> 2138
Of course you may have noticed that all the branch statements apparently did update to the new location. How did that happen? As it turns out, branches are relative. In fact, they don't store their value as 'BNE 49152', they store it as 'BNE +24' or 'BNE -13'. It's an offset from the current address in the PC. Jumps, on the other hand, are absolute. They store the full address of the destination. Consequently they take more space and execute slightly slower.
Another consequence of the relative nature of branches is that they are limited in size. You can't branch a long distance away from your current location. When writing assembly, this is something you either have to look out for (and use a branch to a jump, if necessary) or have it handled automatically by a good assembler. And like many things in assembly language, this is not the only approach. This distinction between jump and branch is a hallmark of the Motorola and similar instruction sets.
This is where the loop finally exits. You'll note that past the end of the other programs we looked at, there will be a BRK, or break instruction. That isn't something that any of them programs actually included; it's just the next line that I've left in. As you may have guessed (or looked up in the chart), the BRK is opcode 0, and that's what empty memory locations are initialized to. So as long as nothing was previously in memory, a routine that runs past the end will simply exit when it hits the break.
49310 lda # 64 49312 sta 54276 49315 lda # 147 49317 jsr 65490 49320 rts --------------------
This routine uses RTS to exit, which ensures that we aren't relying on empty space following us, and also allows this to be called as a subroutine (say, if it were a loading screen or something like that). The disassembler actually adds the dashed line, to make it clearer when a subroutine has finished.
Prior to exiting, we turn off the sound (I'm presuming) and output a character that clears the screen, using the same KERNAL routine as at the start of this little program.
So, now that we know what's going on in this code, why not do some hacking?
Obviously we could read this in, modify it, and create our own version using an assembler but I'd like to mess with it in-memory.
One easy thing we can do is fix that absolute JMP to make it work with our relocation. If you recall that the first byte of an instruction is the opcode, the following bytes are the operands. So all we need to do is change the operand. The JMP instruction is at 49307, and that means the operand starts at 49308. The only hitch is that we can't poke 16-bit values; we need to put the desired operand in byte form. The desired target is 49242. Divided by 256, we get 192 with a remainder of 90. (Alternately, we can use hexadecimal: 49242 = $C05A, $C0 = 192 and $5A = 90).
Therefore, we can get this working in the relocated spot by using these statements:
We have to call it using 'SYS' since it's no longer a BASIC program (I've loaded the disassembler into that space).
poke 49308, 90:poke 49309, 192 sys 49168
Here's something a bit more complex. That random number generator works by using the 'noise' setting of the oscillator. But what if that oscillator is set to do something else?
Referring to a memory map, the location that controls the oscillator is 54290. This is oscillator 3 (the only one that the 'random' sampler works with). That gets set in 'line' 49194 above. It's set from the accumulator using the #128 that was loaded a few lines prior.
So we'd like to change that, but without necessarily disrupting the value stored in 54278. We can't simply insert a line, but by looking at the code, a solution presents itself. The initial value in 54273 gets changed to something random just a short while later. We can probably get away with using something else there.
So, we'll shift things a line up, change the initialization value for 54273 and 54278, and then put a new value to save in 54290.
To make things easier to test, I did this with the code in its original location. So 49194=2090, and the other values precede it by a few lines.
The first line there shifts the instruction at 2087 up to 2085; that's the STA 54278. Next we change the previous LDA operand to 128. Finally, we poke in a new instruction. Looking up the opcode for LDA (immediate), it's 169. Our new value is 32.
for i=2085to2087:pokei,peek(i+2):next i poke 2081,128 poke 2088,169 poke 2089,32
This is the finished product (adjusted for the disassembler):
Running this will produce a non-noisy pattern of output; the randomness is gone. Other values for the oscillator can be experimented with by altering that operand. Try POKE 2089, 16 or any other value that's a power of 2.
49181 sta 54296 49184 lda # 128 49186 sta 54273 49189 sta 54278 49192 lda # 32 49194 sta 54290
One other thing that isn't really a hack, but an observation on how to tweak this program. I'm not going to do it with pokes & peeks, as implementing it would require altering the code too much, and the effect might not even be observable.
Because we're using indirect indexed addressing for the video memory, the loop could execute faster by incrementing the Y register instead of locations 251 & 253. This is because to increment a value in memory, the processor actually has to read memory, modify the value, and store it back when done. By using registers those steps aren't required. Now this does make things feel 'ugly'. We're no longer doing a simple step through the values, and the actual address used, instead of being stored in 251-252, is a combination with the Y register, and kind-of shared with the color memory's address.
It's a pointless optimization here, but it's an example of the sort of thing an assembly language programmer might do. Of course there are circumstances where they might also need those extra cycles; for all I know changing this to use the Y register as an actual index might end up making it appear very differently. One of the primary advantages of programming in assembly is that you can calculate precisely how long each segment takes to execute*.
*This is less true of modern processors, which can optimize their code on-the-fly for increased performance.
As an interesting additional note on the disassembler. In the recent battle over the security of a cell phone between Apple and the US FBI, the topic of disassemblers came up. Noted barely-sane person John McAfee made the claim that one could apply a disassembler to Apple's code, and simply 'bypass' the check on the passcode to unlock the phone. While this exemplifies a deep misunderstanding of the particular issue, I hope it can be seen that it's not implausible for some cases. Some of the anti-piracy techniques of the time could quite easily be defeated by inserting a judicious JMP or a bit setting instruction.