Part 34: Disassembling (Part 1)
Disassembly for fun and
Giving a complete explanation of Assembly is definitely beyond the scope of this thread, but I am hoping that as I delve into the secrets of other people's code with the Disassembler, at least some essence of how it works will be conveyed. That said, there will be a brief overview here, which you can skip and come back to if you just want to see the code taken apart.
Machine language is actually fairly simple, as it only consists of a handful of commands, with a few variations. That doesn't mean it's easy to understand what it is doing; arguably the opposite is true. A fair amount of effort must be expended cross-checking reference material, address listings, and the like, to decipher what the code is accomplishing. But when you know that with enough effort you can figure it out, it's pretty fun to pull back that veil from the inside.
To get one thing out of the way from the start: We'll see a number of values that are limited to 0-255, and also be talking about 'bytes'. The C64 uses an 8-bit processor (the MOS 6510, a variant of the 6502), which means that one 8-bit byte is the natural unit of storage. While a few things are coded as individual bits, most values are accessed and processed one byte at a time.
To further explain how machine language and disassembling works, we need to consider how the code is stored in memory. When interpreted as machine language, the contents of memory consist of a list of instructions. Under normal operation, the processor steps through the instructions one at a time, in sequence.
The first byte of the instruction is called the opcode. Depending on what the opcode is, the next several bytes will contain the operands, if required. Every value from 0 to 255 corresponds to a potential opcode. I say 'potential' opcode. Only a certain set of opcodes are specified to work. Technically every opcode will make the processor do something, even if that something might be jam/crash for the undocumented ones. There are even a few with some unexpected effects that can even let you get away with doing two things at once if you're lucky. You really need a physical machine to explore what happens in those cases, though.
Here's a reference chart that shows all the (documented) opcodes: http://e-tradition.net/bytes/6502/6...uction_set.html
Once the opcode is read, the processor will operate on that, plus any operands. The next value in memory will be interpreted as the next opcode in sequence. You could start interpreting the numbers in memory as machine language from any location, but it will only make sense if those values actually form a meaningful sequence.
It's not difficult, then, to understand how the disassembler program works. It saves us the effort of looking up the opcodes in the chart, grabs the operands, and does some formatting to make it easy to read.
Even without a full range of 256 values, the instruction list may seem like a lot, but it boils down to just a few different actions. You can:
- move values around (load to a register, store a register to memory, transfer between registers, or push/pull)
- change where the program will go next (branches and jumps)
- make simple changes to some values (increment, set/clear bits)
- do more complex logic and math operations
...and if you really want, do nothing at all (NOP, called a 'no op').
If you look at that chart you'll see that in at least some cases, the types of commands are grouped together. Probably the most noticeable is that similar commands tend to show up in the same column; what's different about them is something called the 'mode' which is how the operand is interpreted. Detailing the modes isn't all that important, but I'll go over them for the commands we see below.
The other important thing to understand about the processor is the registers at the heart of the machine. These are all 8-bit registers, and most of the machine language instructions affect them in some way. There are six registers, only three of which are normally interacted with.
The Accumulator (sometimes referred to as ACC) is the primary register for the processor. It's the only one that math and logical operations can be used with, and as such sees the most usage.
The X and Y Registers (also called Index Registers) can be used for temporary storage of values, but since they can't perform all the operations of the accumulator, their primary use is as loop counters. Some of the operand modes automatically add the X or Y register as the offset to an address.
That's it for the 'general' registers. Here are the other three:
The Program Counter (PC) stores the address of the next instruction to execute in memory. The PC is not manipulated directly, but can be modified as a result of an instruction like a jump.
The Status Register (SR) contains a series of single bit flags to indicate various conditions. Typically this is for testing the result of logical or arithmetic operations, but there are other bits located here as well. The SR is not used as a whole; instead, the individual bits are set, cleared, or checked by various instructions.
The Stack Register is used for temporary storage of values. Values put on the stack are stored in memory, and so any number of values can be saved until memory space runs out (what the Stack Register actually holds is the location where values on the stack can be found). Using a stack allows for code to be recursive (jump back to itself), or for some other operation that needs to temporarily save the current state. In contrast to modern usage ('push/pop'), the 6502 uses the terms 'push' and 'pull' for the stack. Only the Accumulator and Status Register can be put on the Stack.
That's hopefully enough of an overview to get started. Now let's make Johnny Five very sad:
Our first look is this little gem from Fred M. Sloniker:
code:
10 data238,32,208,238,33,208,76,0,192
20 forzz=0to8:readzx:poke49152+zz,zx:next
30 printchr$(147):sys49152
*Note that in VICE, using the 'Open' command to create a disk on-the-fly seems to clear this memory, so be sure to have your disks ready if you try this in an emulator.
As there's only one line of data statements, we can expect the resultant code to be quite short, and indeed it is.
code:
start address(decimal)
? 49152 (hex=c000)
49152 inc 53280
49155 inc 53281
49158 jmp---> 49152
49161 brk
49162 brk
... (rest is brk statements)
As it happens, these two locations control the screen's background and border color. So what we're doing is very rapidly cycling through them. The effect you see on the screen doesn't look like any consistent screen color, however. What's happening is that the machine language is executing so quickly that the colors are changing while the video chip is still in the process of redrawing the screen. That's why this effect only works properly when using a tight machine language loop.
Here's the equivalent in BASIC, if you want to compare. This is merely an attempt to imitate the technique used; there are faster ways to do the same thing in BASIC, but any method is still going to be noticeably slower.
code:
10 poke 53280,(peek(53280)+1 and 255)
20 poke 53281,(peek(53281)+1 and 255)
30 goto 10
code:
10 rem omg goon rush
20 for i=0 to 19:read a:poke 49152+i,a:next i:sys 49152
30 data 169,6,160,8,32,30,171,238,134,2,32,228,255,240,241,169,154,76,210,255
code:
start address(decimal)
? 49152 (hex=c000)
49152 lda # 6
49154 ldy # 8 ; pointer to where the string is
49156 jsr 43806 ; string out (BASIC routine)
49159 inc 646 ; change cursor color
49162 jsr 65508 ; get char
49165 beq 49152
49167 lda # 154 ; light blue
49169 jmp---> 65490 ; output char
49172 brk
I should stress that the comments I add tend to be guesswork, with a few other things figured out by looking at other reference material. That's part of the fun of disassembly -- you get to try and reconstruct how the original assembly might have been written.
The first two instructions are 'LDA' and 'LDY', which stand for a 'load' of the Accumulator and the Y register respectively. The # means we're using immediate mode, which means that those literal values are being loaded to the registers. That sets the Accumulator to 6, and the Y register to 8.
The next statement is a 'JSR'. This is also a jump, like the 'JMP' instruction, but in this case it means 'Jump to Subroutine'. At some point the routine we're jumping to will return, and we'll come back right here, proceeding with the next statement in seqence.
The address we jump to is 43806. I happen to know that that is somewhere in the BASIC interpreter. After looking it up, this turns out to be a 'output a string' routine. Jumping into BASIC subroutines can save you space in your own code, though it is a sort of uncharted territory if you don't have the machine already mapped out (as has been done multiple times at this point).
Without knowing much about that string routine, we can make a good guess at those values loaded into the Y and Accumulator are. Almost any subroutine needs some sort of set-up, and if we're going to output a string, the routine needs to be told which string we want to print. That means those two registers are likely being used to supply an address, and knowing what this program does and the value contained, that interpretation makes sense.
The first thing to note is that because this is an 8-bit machine with a 16-bit address space, we need two registers if we want to access all of memory. Since I know what this code does, I can recognize that the address we're loading takes the Y register as the high byte: $0806, in other words. ($ indicates hexadecimal -- with the value split across registers it's a lot easier to read it as such).
Looking at the code snippet, we didn't apparently poke anything into location $0806 (2054 decimal). How did the string that does get printed get there, and how did the programmer know it would be there (or alternately, how did I know that's what it should mean)?
As it turns out, when a new BASIC program is written, it will start at location $0800 by default. And so that first REM statement is going to be stashed right around that point. Now, BASIC programs are not all stored as the literal typed-in characters; the representation compresses keywords, drops a few things here and there. But REM statements can be anything, so that statement actually must be stored somewhere as a literal string.
So that subroutine will output whatever string is located at the REM statement (or alternately, whatever data is located at $0806, interpreted as a string whether it is one or not).
Continuing on, we see an INC statement, this time to location 646. That cycles the cursor (text printing) color.
Next we have another JSR, up to a real high place in memory (65508). This is one of a number of routines known as KERNAL routines. Unlike jumping into the BASIC interpreter, using the KERNAL routines was encouraged and was indeed the recommended approach to doing I/O in machine language. These routines were fully documented and explained in the C64 Programmer's Reference Manual; this one checks to see if a key has been pressed.
After that, we see a 'BEQ' statement. Most statements starting with 'B' are a branch, which is similar to a JMP. Branches can test for a particular bit in the Status register and they only jump when that condition is true. In this case, it's checking for 'equal'; if there had been a keypress than the result would not be equal, and we'd want to exit. But if no keys were pressed, we loop around.
This looked like another fairly short loop of code. But it takes advantage of the built-in routines to accomplish most of its work. That also makes it slightly slower than the previous one (outputting text is always slow), but it's still faster than the BASIC would be.
Incidentally, we can also look at the disassembly of those subroutines we're jumping into. I'm not going to analyze them, but here's the BASIC one. You can see that it involves effectively looping through (the string) and also calling more subroutines. The 'RTS' statement is a 'Return from Subroutine' that is required after a JSR was used to arrive at this piece of code.
code:
43806 jsr 46215
43809 jsr 46758
43812 tax
43813 ldy # 0
43815 inx
43816 dex
43817 beq 43751
43819 lda ( 34 ),y
43821 jsr 43847
43824 iny
43825 cmp # 13
43827 bne 43816
43829 jsr 43749
43832 jmp---> 43816
43835 lda 19
43837 beq 43842
43839 lda # 32
43841 bit 7593
43844 bit 16297
43847 jsr 57612
43850 and # 255
43852 rts
--------------------
43853 lda 17
43855 beq 43874
43857 bmi 43863
43859 ldy # 255
43861 bne 43867
43863 lda 63
43865 ldy 64
43867 sta 57
43869 sty 58
43871 jmp---> 44808
43874 lda 19
43876 beq 43883
43878 ldx # 24
43880 jmp---> 42039
type c for 43883
? -1
Next time we'll get to Project Atrocity, which is long enough to let us mess around with it, and also will let us show off a few other features of the disassembler.