Compute!'s Gazette Part #35 - Disassembling (Part 2)

Part 35: Disassembling (Part 2)

Let's Disassemble the Thread Code, Part 2

With a rudimentary understanding of assembly & disassembly under our belt, we can tackle Project Atrocity, an even longer and more complex piece of machine language.

This one presented a slight difficulty for our disassembler, as it loads itself into location $0800/2048. Recall that this is the start of BASIC programs. That means we need to relocate Atrocity in order to run the disassembler. To keep things consistent with the other programs, I loaded it from disk and moved it up to location 49152 (using PEEKs and POKEs).

We'll look at it in segments, as it's somewhat longish.

code:

start address(decimal)
? 49152 (hex=c000)
 49152  brk
 49153   ?      12
 49154  php
 49155  asl
 49156  brk
 49157   ?      158
 49158  jsr 12338
 49161  rol  50 ,x
 49163  brk
 49164  brk
 49165  brk
 49166   ?      243
 49167  brk

Already you're probably wondering ... why does this supposed machine language look like it has garbage at the start? This is entered correctly and copied exactly; there's nothing wrong with it. The thing is, that's not machine language.

What this is is a line of BASIC code. Trying to read BASIC code as if it were machine language yields nonsense. So even though it was entered using MLX, it's a line of BASIC. The only line in the program is 'SYS 2062' which calls the actual machine language. (I didn't get that by using the disassembler, I got that by loading the program and typing 'LIST'.) This was alluded to much earlier in the thread; just because something is entered using MLX, that doesn't mean it is necessarily in machine language.

The actual assembly code begins here (which we could also determine by adding our offset to 2062, where the original code started execution):

code:

 
 49168  sta 54274
 49171  sta 54277
 49174  lda   # 1
 49176  sta 54275
 49179  lda   # 15
 49181  sta 54296
 49184  lda   # 64
 49186  sta 54273
 49189  lda   # 128
 49191  sta 54278
 49194  sta 54290
 49197  lda   # 255
 49199  sta 54286
 49202  sta 54287
 49205  sta 54272

I won't go into the details of each statement, but we have a lot of LDA (Load the Accumulator) and STA (Store the Accumulator) into locations which are associated with sound. This is all basically set-up to be able to make the noises. The desired values are placed into the accumulator, then stored into memory.

code:

 49208  lda   # 147    	; clear screen character
 49210  jsr 65490      	; output char (to channel, default is screen)

Here we see a call to a KERNAL routine, one that outputs a character. This is the 'clear screen' control character; outputting it (even in machine language) will clear the screen.

code:

 49213  lda   # 0      ; stash a few values for easy access
 49215  sta   251      ; in locations 251-254
 49217  lda   # 4
 49219  sta   252
 49221  lda   # 0
 49223  sta   253
 49225  lda   # 216
 49227  sta   254
 49229  ldy   # 0

Previously I mentioned that you need two 8-bit registers to form a 16-bit address. But if you stick with only 8-bit addresses, you can save time, space, and even do a few things that aren't allowed when using the whole address space. There are just a few free memory locations available for programmers to use in this part of memory (also known as 'zero-page memory'), and these are those locations. The values being stashed here will be used a bit later.

code:

 49231  lda 54299        ; random value from OSC3
 49234  sta 54273       ; play sounds?
 49237  lda   # 65
 49239  sta 54276

The first statement here is important and thanks to the technique being mentioned I know what it does. It's loading a random value from one of the sound oscillators. Instead of playing the sound, it can be used as a random number generator when the waveform is set to 'noise'. This is actually a lot faster (and better randomness) than using the built-in random function.

I'm not totally certain, but I'm reasonably sure the statements after the random value is grabbed are playing the actual sounds, using another voice.

code:

 49242  jsr 65508           ; get keypress (KERNAL)
 49245  bne    49310

In OMG GOON RUSH this same KERNAL routine was used to check if a key was pressed. This time it's a 'BNE', or 'branch if not equal', meaning that we only jump out when the key was pressed, and continue straight on otherwise.

code:

 49247  lda 54299
 49250  and   # 15
 49252  sta 53280

We grab another random number, and use it to set the screen border color. The 'AND' statement masks off the values to restrict it to the range 0-15. It's one of those arithmetic/logical operations that only the Accumulator can do.

code:

 49255  sta ( 253 ),y
 49257  lda   # 214
 49259  sta ( 251 ),y

Here we have the first use of a parenthesized address. That means indirect mode, and since it's marked with the Y Register afterward, it does something more with it. Indirect mode means that the value at the address we're providing is the actual address where we want to store our value (if you know what a pointer is, think of that). The addition of an index register means the index register is an offset. This particular mode is only usable with the Y register, and it means the value in Y is added to the stored address as an offset; this mode is known as 'indirect indexed'.
This is the thing that required zero-page addresses. The value that gets looked up will actually be a 16-bit address (often called a 'word address' in C64 context). So location 253 and 251 are storing 16 bit addresses, and we're putting a random value (0-15) in one and the value 214 in the other. If you look back up at the initialization for those registers, we see that 251-252 has $0400, and that's the start of video memory. The other register is $D800/55296, which I'm pretty sure is color memory (in a character-sized chunk). So we're writing a character to screen, with a random color for that bit of screen. As it happens, the Y register is 0, so that does not affect the result. (If Y did have a value, it would be added to $0400 or $D800).

code:

 49261  inc   251
 49263  bne    49267
 49265  inc   252
 49267  inc   253
 49269  bne    49273
 49271  inc   254

Here we are doing some 'INC' of those addresses, because we want to cycle through the whole of video memory. The branch statements are used for carries (the high byte is incremented only if the low byte rolled over to 0).

code:

 49273  lda   # 232
 49275  cmp   251
 49277  bne    49242
 49279  lda   # 7
 49281  cmp   252
 49283  bne    49242

Next we check against a value using 'CMP', or compare, to see if our loop is done. If not, we branch back until the screen is filled with random colors and our desired character. The fact that this is all being done in machine language, which is to say very quickly, is what makes it look a bit twitchy as the screen refreshes around it.

code:

 49285  lda   # 0
 49287  sta   251
 49289  lda   # 4
 49291  sta   252
 49293  lda   # 0
 49295  sta   253
 49297  lda   # 216
 49299  sta   254

When the loop is done, we need to reinitialize those addresses. Then we can go around and do the whole thing again.

code:

 49301  lda 54299
 49304  sta 54273
 49307  jmp---> 2138

And here as we get set to restart our loop, we notice a problem. This JMP instruction is still saying to jump down to 2138, the original address! It didn't change just because I moved the code around. Adjusted for the new location, it should be 49242.

Of course you may have noticed that all the branch statements apparently did update to the new location. How did that happen? As it turns out, branches are relative. In fact, they don't store their value as 'BNE 49152', they store it as 'BNE +24' or 'BNE -13'. It's an offset from the current address in the PC. Jumps, on the other hand, are absolute. They store the full address of the destination. Consequently they take more space and execute slightly slower.
Another consequence of the relative nature of branches is that they are limited in size. You can't branch a long distance away from your current location. When writing assembly, this is something you either have to look out for (and use a branch to a jump, if necessary) or have it handled automatically by a good assembler. And like many things in assembly language, this is not the only approach. This distinction between jump and branch is a hallmark of the Motorola and similar instruction sets.

code:

  49310  lda   # 64
  49312  sta 54276
  49315  lda   # 147
  49317  jsr 65490
  49320  rts
 --------------------

This is where the loop finally exits. You'll note that past the end of the other programs we looked at, there will be a BRK, or break instruction. That isn't something that any of them programs actually included; it's just the next line that I've left in. As you may have guessed (or looked up in the chart), the BRK is opcode 0, and that's what empty memory locations are initialized to. So as long as nothing was previously in memory, a routine that runs past the end will simply exit when it hits the break.
This routine uses RTS to exit, which ensures that we aren't relying on empty space following us, and also allows this to be called as a subroutine (say, if it were a loading screen or something like that). The disassembler actually adds the dashed line, to make it clearer when a subroutine has finished.
Prior to exiting, we turn off the sound (I'm presuming) and output a character that clears the screen, using the same KERNAL routine as at the start of this little program.

So, now that we know what's going on in this code, why not do some hacking?

Obviously we could read this in, modify it, and create our own version using an assembler but I'd like to mess with it in-memory.

One easy thing we can do is fix that absolute JMP to make it work with our relocation. If you recall that the first byte of an instruction is the opcode, the following bytes are the operands. So all we need to do is change the operand. The JMP instruction is at 49307, and that means the operand starts at 49308. The only hitch is that we can't poke 16-bit values; we need to put the desired operand in byte form. The desired target is 49242. Divided by 256, we get 192 with a remainder of 90. (Alternately, we can use hexadecimal: 49242 = $C05A, $C0 = 192 and $5A = 90).

Therefore, we can get this working in the relocated spot by using these statements:

code:

poke 49308, 90:poke 49309, 192
sys 49168

We have to call it using 'SYS' since it's no longer a BASIC program (I've loaded the disassembler into that space).

Here's something a bit more complex. That random number generator works by using the 'noise' setting of the oscillator. But what if that oscillator is set to do something else?

Referring to a memory map, the location that controls the oscillator is 54290. This is oscillator 3 (the only one that the 'random' sampler works with). That gets set in 'line' 49194 above. It's set from the accumulator using the #128 that was loaded a few lines prior.
So we'd like to change that, but without necessarily disrupting the value stored in 54278. We can't simply insert a line, but by looking at the code, a solution presents itself. The initial value in 54273 gets changed to something random just a short while later. We can probably get away with using something else there.

So, we'll shift things a line up, change the initialization value for 54273 and 54278, and then put a new value to save in 54290.

To make things easier to test, I did this with the code in its original location. So 49194=2090, and the other values precede it by a few lines.

code:

for i=2085to2087:pokei,peek(i+2):next i
poke 2081,128
poke 2088,169
poke 2089,32

The first line there shifts the instruction at 2087 up to 2085; that's the STA 54278. Next we change the previous LDA operand to 128. Finally, we poke in a new instruction. Looking up the opcode for LDA (immediate), it's 169. Our new value is 32.

This is the finished product (adjusted for the disassembler):

code:

 49181  sta 54296
 49184  lda   # 128
 49186  sta 54273
 49189  sta 54278
 49192  lda   # 32
 49194  sta 54290

Running this will produce a non-noisy pattern of output; the randomness is gone. Other values for the oscillator can be experimented with by altering that operand. Try POKE 2089, 16 or any other value that's a power of 2.

One other thing that isn't really a hack, but an observation on how to tweak this program. I'm not going to do it with pokes & peeks, as implementing it would require altering the code too much, and the effect might not even be observable.
Because we're using indirect indexed addressing for the video memory, the loop could execute faster by incrementing the Y register instead of locations 251 & 253. This is because to increment a value in memory, the processor actually has to read memory, modify the value, and store it back when done. By using registers those steps aren't required. Now this does make things feel 'ugly'. We're no longer doing a simple step through the values, and the actual address used, instead of being stored in 251-252, is a combination with the Y register, and kind-of shared with the color memory's address.

It's a pointless optimization here, but it's an example of the sort of thing an assembly language programmer might do. Of course there are circumstances where they might also need those extra cycles; for all I know changing this to use the Y register as an actual index might end up making it appear very differently. One of the primary advantages of programming in assembly is that you can calculate precisely how long each segment takes to execute*.

*This is less true of modern processors, which can optimize their code on-the-fly for increased performance.

As an interesting additional note on the disassembler. In the recent battle over the security of a cell phone between Apple and the US FBI, the topic of disassemblers came up. Noted barely-sane person John McAfee made the claim that one could apply a disassembler to Apple's code, and simply 'bypass' the check on the passcode to unlock the phone. While this exemplifies a deep misunderstanding of the particular issue, I hope it can be seen that it's not implausible for some cases. Some of the anti-piracy techniques of the time could quite easily be defeated by inserting a judicious JMP or a bit setting instruction.

The Let's Play Archive

Compute!'s Gazette

by Chokes McGee

Part 35: Disassembling (Part 2)

code:

code:

code:

code:

code:

code:

code:

code:

code:

code:

code:

code:

code:

code:

code:

code: