Solaris Part #11 - Tech Post 3: The Quadrant Scanner

Part 11: Tech Post 3: The Quadrant Scanner

Tech Post 3: The Quadrant Scanner

Let's turn our eyes back to the map screen.

Last time, we discussed how to get a 48-pixel-wide sprite out of an Atari display. That technique is used to display the score, but also to display the SCANNER graphic, and the time-to-jump timer under the map itself. These also all line up. It turns out that Solaris is using a single subroutine for "display me some centered 48-pixelness, please" and just calls out to it with different graphics pointers as needed.

Today, we will discuss how the rest of the map is drawn.

This map is a 6x8 grid of symbols, with a graph-paper sort of effect over them. Also, some of the edges are special and have gaps in them.

The Symbols

In principle, we've solved this already: this is just like the 48-pixel sprite trick, but you don't have to actually do any trickery with it.

That is, instead of interleaving the sprites 010101...

You can just set them up to be 0 0 0 1 1 1. The empty spaces give you time to actually load the next values. Then you just alter the graphics as needed.

That does require a bit of care, but this mostly means keeping your timing precise. Examining the drawing code shows that there are occasional instructions that do nothing, or that are slower than they "need" to be (that is, there are other instructions you could write that do the same thing faster). Those are to make sure that graphics aren't overwritten too soon. One of the Atari developers at the time described the process of working out how to fit in the work of the display between the timing of the graphics updates as being like solving an acrostic. I've dabbled in 2600 coding, and I think that's a pretty fair comparison.

The Grid

The grid is, if anything, even simpler. Except for the north and south exits, everything about the playfield grid is symmetrical. The horizontal lines and the thick lines on the left and right edges are a straightforward symmetric playfield with no hacking done at all. The north and south exits are a fairly trivial hack where you rewrite the middlemost part of the mirrored playfield at a specific cycle count after horizontal sync, getting you two playfield pixels worth of asymmetry.

The thin vertical lines, on the other hand, are not the playfield. They are the missiles. It turns out that missiles are replicated alongside player replication; this means, all else being equal, when replicated players fire their missiles, they fire in synchrony with each other in both time and space. It's primitive but cheap and effective. Here, we're already replicating images for each unit, so placing the missile graphic between each copy means that you get equally evenly spaced lines. "Closely spaced" is 16 pixels apart in each case, so if you magnify the screen and count, you can see that there are 4 pixels of empty space on either side of a map icon.

The Tile Engine

So far, everything's been very clean and simple. The tricky remaining question is this: how do we figure out which graphics to draw? This world is built out of 8x8 tiles, but we need to manage six of those, a line at a time, and keep the grid coherent too. 76 cycles per scanline is less tight timing than we saw doing the sprite trick, but it's not exactly luxurious. There's also the issue of the map changing in realtime and thus needing to be kept in RAM. At 128 bytes of RAM we can't be profligate there, either.

However, looking at the code run, it appears that there isn't anything unusually tricky going here. One simply has to be careful, not clever.

There are 12 different kinds of things that can be in a sector, and we've now seen all of them: Nothing, you, a Federation planet, a Corridor, a Zylon planet, a Blockader, a generic attack group, Kogalon Star Pirates, a Flagship task force, a Cobra Fleet, a wormhole, or a starfield. Round that up to 16 and you have four bits per sector and 48 sectors per quadrant. 24 bytes can be loaded into RAM to represent the current state of the quadrant, and we can update that as we need to, which is only going to be every few dozen frames. Map update logic is banished to the vertical blanking period, completely out of scope of our investigations.

At some point before the drawing of a new row of symbols, the three bytes corresponding to that row are consulted and turned into a series of addresses mathematically, stored in memory locations $92-$9D. The 16 possible values each correspond to 8 bytes in the $F100-$F17F range, so we have a one-stop shop for map graphics.

That takes a pretty significant amount of time, but it has over one full scanline to do that work between rows (the divider line basically draws itself once you write the playfield graphic into place). The first row isn't as lucky but it doesn't have to be; we've got like eight blank lines of prep time between the end of SCANNER and the start of the map grid.

A Brief Digression: Addressing Modes, aka, "How does data get from places to other places?"

We haven't actually talked about how data moves around in the 6502. CPUs do all their work in special chunks of logic called registers. These can be read and updated at, for all practical purposes, the speed of light. Values stored within them can influence what address is read or written in memory, and they are usually what hold the data read from or written to memory. Registers also usually are the only things that can truly have math operations done to them. RISC chips like ARM or MIPS have several dozen registers and the vast bulk of them can use almost any register for any purpose. It is only systemic convention that makes certain registers mean certain things.

The 6502 has a similar focus on register operations, but it only has one register that can actually participate in mathematical operations. That one register is the one that basically does all the work. Such a register is traditionally called an accumulator, and the 6502 is thus an accumulator-based architecture.

(For comparison, the x86 series started out as an "extended accumulator" architecture; it had quite a few registers and most could do math, but many registers were nevertheless uniquely privileged to be part of other instructions. With the advent of the 386, these restrictions and privileges evaporated and the architecture became another general-purpose register system.)

In addition to the accumulator (or the A register), the 6502 has the X and Y index registers. These can read and write memory, but you can't do real math on them and you normally use them to help work out where in memory you plan on reading or writing. There are a bunch of ways the registers interact with instructions to get an address. Here's the important ones from a 2600 standpoint:

LDA #$92. This loads a constant value in without checking RAM at all. It costs 6 pixels of time. (Because the value is right there, this is called immediate mode.)
LDA $92. Load a value out of RAM. This is what we saw in the 48-pixel sprite trick, more or less, and it costs nine pixels of time. Commanding the graphics registers ends up looking like this too.
LDA $F192. Load a value out of a fixed location in ROM. The extra time it takes to read an address twice as large means this costs twelve pixels of time.
LDA $F100,Y. Start with the value $F100, then add whatever is in the Y register to that, then load the byte from that result into the accumulator. This is the absolute indexed mode; we are looking up an entry in a table that exists at a fixed (i.e. absolute) location in the ROM. It still only costs twelve pixels unless adding the value of Y alters the high byte of the address. If that happens, it takes another 3 pixels to carry the 1 and get the address right. 2600 programmers will generally make sure that this never happens.
LDA ($92),Y. This reads the 16-bit value in $92 and $93 as an address, adds the contents of Y to it, and then loads the byte from the resulting address. This mode is called indirect indexed and it's basically just like the previous mode except that the location of the table is stored in RAM instead of being part of the instruction. It costs 15 pixels of time to do this, with a possible extra three pixels for carrying the 1 that no Atari programmer will ever permit. If you're used to programming languages intended for humans, this instruction is handed an array and reads a value out of it.

A Digression Ends, Returning Us To The Matter Of Loading Graphics Data

There turns out to be enough time in each scanline to use the indexed-indirect mode to load all the graphics, which means that six pointers (2 bytes each, so 12 bytes total) hold the locations of the tables to use for each sector in the row.

There isn't enough time to do that and still respect the drawing timing constraints, though. Solaris solves this by loading one of the graphics during HBLANK instead of mid-line and just stuffing it into RAM where it will take nine pixels of time to load instead of fifteen. This takes more time, total, to actually do, but with 228 pixels worth of time per line, it turns out this isn't that cramped.

That means the total memory cost for drawing the map is:

24 bytes of map contents, at half a byte per sector.
12 bytes of pointers, computed before the row start from the map contents.
128 bytes of shape table data, in the ROM where we aren't really space-constrained at all.

This is cool because we can then go look up the shape table information in the ROM by following the pointer in RAM. Cooler yet, Stella's debugger remembers if a memory load was then stored to the graphics registers and alters the disassembly to show the values as graphics:

There's the Corridor, the Blockader, and the generic Attack Group, right there. Sharp-eyed readers, however, may notice something odd about these graphics: they are upside-down.

This, it turns out, is a speed hack.

You see, any time you do anything that alters a value, a bunch of status flags are set. These record information like whether the operation overflowed, or whether the result was negative, or whether it was zero. To compare if two values are equal, for instance, you subtract them and then check to see if the zero flag was set. This is so common that there's actually a set of commands CMP, CPX, and CPY whose job is to do that subtraction (and with any register, not just the accumulator!) but not trash any registers while doing that.

So, if you want to loop eight times, you could write something like this:

code:

        LDY #$00                 ; Put zero in the Y register
loop:   ;; Do stuff here...

        INY                      ; Increment the Y register
        CPY #$08                 ; Compare it to 8
        BNE loop                 ; If they aren't equal , back to loop

That'll run the loop 8 times with each value of 0 through 7 living in the Y register. But if you do the loop backwards, you don't need the CPY instruction because every operation that alters a register does an implicit compare against zero as part of its work:

code:

        LDY #$07                 ; Put the LAST offset into the Y register
loop:   ;; Do stuff here...

        DEY                      ; Decrement the Y register, compare to 0
        BPL loop                 ; If Y is NONNEGATIVE, back to loop

That saves you six pixels of time on each iteration, and that is a very nice thing to have. The only price you're paying here is that the graphics look weird if you try to read them directly out of a disassembly instead of off your own sprite sheets, so that's not really a cost at all.

Conclusion

There's a lot of balls in the air in this system, but this is actually a pretty straightforward display kernel, in the end. I'm a relatively proficient 6502 assembly language programmer—albeit more for the Commodore 64 than the Atari—and the only part of the code here that looked at all out of place was the no-ops it needed to keep the display stable at appropriate times.

Refreshingly sane, really.

NEXT TIME: We study the status window, and "refreshingly sane" becomes a thing of the past.

The Let's Play Archive

Solaris

by ManxomeBromide

Part 11: Tech Post 3: The Quadrant Scanner

code:

code: