Chapter 10: VERA FX Reference

Author: MooingLemur, based on documentation written by JeffreyH

This is preliminary documentation and the specification can still change at any point.

Introduction

This is a reference for the VERA FX features. It is meant to be a complement to the tutorial, currently found here.

The FX Update mainly adds “helpers” inside of VERA that can be used by the CPU. There is no “magic button” that allows you to do 3D graphics for example. It mainly helps at certain CPU time-consuming tasks, most notably the ones that are present in the (deep) inner-loop of a game/graphics engine. The FX Update does therefore not fundamentally change the architecture or nature of VERA, it extends and improves it.

In other words: the CPU is still the orchestrator of all that is done, but it is alleviated from certain operations where it is not (very) good at or does not have direct access to.

FX Update extends addressing modes, it does not add or extend renderers.

Usage

DCSEL

VERA is mapped as 32 8-bit registers in the memory space of the Commander X16, starting at address $9F20 and ending at $9F3F. Many of these are (fully) used, but some bits remain unused. The DCSEL bits in register $9F25 (also called CTRL) has been extended to 6-bits to allow for the 4 registers $9F29-$9F2C to have additional meanings.

Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
$9F25 CTRL Reset DCSEL
ADDRSEL

The FX features use DCSEL values 2, 3, 4, 5, and 6. This effectively gives FX 20 8-bit registers. Note that 15 of these registers are write-only, 2 of them are read-only and 3 are both readable and writable,

Important: unless DCSEL values of 2-6 are used, the behavior of VERA is exactly the same as it was before the FX update. This ensures that the FX update is backwards compatible with traditional non-FX uses of VERA.

Addr1 Mode

When DCSEL=2, the main FX configuration register becomes available (FX_CTRL/$9F29), which is both readable and writable. The 2 lower bits are the addr1 mode bits, which will change the behavior of how and when ADDR1 is updated. This puts the FX helpers in a certain “role”.

Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
$9F29 FX_CTRL
(DCSEL=2)
Transp. Writes Cache Write Enable Cache Fill Enable One-byte Cache Cycling 16-bit Hop 4-bit Mode Addr1 Mode
Addr1 Mode Description
0 Traditional VERA behavior
1 Line draw helper
2 Polygon filler helper
3 Affine helper

By default, Addr1 Mode is set to 0 (=00b), which is the normal and already-known behavior of ADDR1.

Line draw helper

When Addr1 Mode is set to 1 (=01b) the line draw helper is enabled.

Setting up the line draw helper

Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
$9F22 ADDRx_H (x=ADDRSEL) Address Increment DECR
Nibble Increment
Nibble Address VRAM Address (16)

Figure 1, The 8 octants

Octant 8-bit ADDR1 increment 8-bit ADDR0 increment 4-bit ADDR1 increment 4-bit ADDR0 increment
0 +1 -320 +0.5 -160
1 -320 +1 -160 +0.5
2 -320 -1 -160 -0.5
3 -1 -320 -0.5 -160
4 -1 +320 -0.5 +160
5 +320 -1 +160 -0.5
6 +320 +1 +160 +0.5
7 +1 +320 +0.5 +160
Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
$9F29 FX_X_INCR_L
(DCSEL=3)
(Write only)
X Increment (-2:-9) (signed)
$9F2A FX_X_INCR_H
(DCSEL=3)
(Write only)
X Incr. 32x X Increment (5:1) (signed) X Incr. (0)
X Incr. (-1)

Note: Of the two incrementers, the line draw helper uses only the X incrementer. However depending on the octant you are drawing in, this incrementer will be used to depict either x or y pixel increments. So the “X” should not be taken literally here, it just means the first of the two incrementers.

Note: There is no need to set the higher bits of the X position, since the FX X position (accumulator) is only used to track the fractional (subpixel) part of the line draw.

Polygon filler helper

When Addr1 Mode is set to 2 (=10b) the polygon filler helper is enabled.

Setting up the polygon filler helper

Assuming a 320 pixel-wide screen * Set ADDR0 to the address of the y-position of the top point of the triangle and x=0 (so on the left of the screen). Set its increment to +320 (for 8-bit mode) or +160 (for 4-bit mode). * Note: ADDR0 is used as “base address” for calculating ADDR1 for each horizontal line of the triangle. ADDR0 should therefore start at the top of the triangle and increment exactly one line each time. * There is no need to set ADDR1. This is done by VERA. * Calculate your slopes (dx/dy) for both the left and right point. Unlike the line draw helper, these slopes can be negative and can exceed 1.0. They are not dependent on octant, but cover the whole 180 degrees downwards. Below is an illustration of some (not-to-scale) examples of increments: Figure 2, examples of Bresenham's slope increment values * Set ADDR1 increment to +1 (for 8-bit mode) or +0.5 (for 4-bit mode) * ADDR1 increment can also be +4 if you use 32-bit cache writes, explained later) * Set your left slope into the two “X increment” registers and your right slope into the two “Y increment” registers (DCSEL=3, see below). * Important: They should be set to half the increment (or decrement) per horizontal line! This is because the polygon filler increments in two steps per line. * Note that increment registers are 15-bit signed fixed-point numbers: * 6 bits for the integer pixel increment * 9 bits for the fractional (subpixel) increment * 1 additional bit that indicates the actual value should be multiplied by 32

Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
$9F29 FX_X_INCR_L
(DCSEL=3)
(Write only)
X Increment (-2:-9) (signed)
$9F2A FX_X_INCR_H
(DCSEL=3)
(Write only)
X Incr. 32x X Increment (5:0) (signed) X Incr. (-1)
$9F2B FX_Y_INCR_L
(DCSEL=3)
(Write only)
Y/X2 Increment (-2:-9) (signed)
$9F2C FX_Y_INCR_H
(DCSEL=3)
(Write only)
Y/X2 Incr. 32x Y/X2 Increment (5:0) (signed) Y/X2 Incr. (-1)
Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
$9F29 FX_X_POS_L
(DCSEL=4)
(Write only)
X Position (7:0)
$9F2A FX_X_POS_H
(DCSEL=4)
(Write only)
X Pos. (-9) - X Position (10:8)
$9F2B FX_Y_POS_L
(DCSEL=4)
(Write only)
Y/X2 Position (7:0)
$9F2C FX_Y_POS_H
(DCSEL=4)
(Write only)
Y/X2 Pos. (-9) - Y/X2 Position (10:8)

Steps that are needed for filling a triangle part with lines: * Read from DATA1 * This will not return any useful data but will do two things in the background: * Increment/decrement the X1 and X2 positions by their corresponding increment values. * Set ADDR1 to ADDR0 + X1

Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
$9F2B FX_POLY_FILL_L
(DCSEL=5, 4-bit Mode=0)
(Read only)
Fill Len >= 16 X Position (1:0) Fill Len (3:0) 0
$9F2B FX_POLY_FILL_L
(DCSEL=5, 4-bit Mode=1, 2-bit Polygon=0)
(Read only)
Fill Len >= 8 X Position (1:0) X Pos. (2) Fill Len (2:0) 0
Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
$9F2C FX_POLY_FILL_H
(DCSEL=5)
(Read only)
Fill Len (9:3) 0

Important: when the two highest bits of Fill Len (bits 8 and 9) are both 1, it means there is a negative fill length. The line should not be drawn!

There is also a 2-bit polygon mode, which is better explained in the tutorial

Affine helper

When Addr1 Mode is set to 3 (=11b) the affine (transformation) helper is enabled.

When reading from ADDR1 in this mode, the affine helper reads tile data from a special tile area defined by two new FX registers: * FX_TILEBASE is pointed to a set of 8x8 tiles in either 4-bit or 8-bit depth. FX can support up to 256 tile definitions, and can overlap the traditional layer tile bases. * FX_MAPBASE points to a square-shaped tile map, one byte per tile. This tile map has no attribute bytes. unlike the traditional layer 0/1 tile maps.

Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
$9F2A FX_TILEBASE
(DCSEL=2)
(Write only)
FX Tile Base Address (16:11)
Affine Clip Enable
2-bit Polygon
$9F2B FX_MAPBASE
(DCSEL=2)
(Write only)
FX Map Base Address (16:11)
Map Size
Map Size Dimensions
0 2×2
1 8×8
2 32×32
3 128×128
Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
$9F29 FX_CTRL
(DCSEL=2)
Transp. Writes
Cache Write Enable Cache Fill Enable One-byte Cache Cycling 16-bit Hop 4-bit Mode Addr1 Mode

When using the affine helper, the X and Y position registers (DCSEL=4) are used to set ADDR1 to the source pixel indirectly in the aforementioned tile map, while the X and Y increments determine the step after each read of ADDR1.

Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
$9F29 FX_X_POS_L
(DCSEL=4)
(Write only)
X Position (7:0)
$9F2A FX_X_POS_H
(DCSEL=4)
(Write only)
X Pos. (-9) - X Position (10:8)
$9F2B FX_Y_POS_L
(DCSEL=4)
(Write only)
Y/X2 Position (7:0)
$9F2C FX_Y_POS_H
(DCSEL=4)
(Write only)
Y/X2 Pos. (-9) - Y/X2 Position (10:8)

The affine helper supports the full range of X and Y increment values, including negative values.

Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
$9F29 FX_X_INCR_L
(DCSEL=3)
(Write only)
X Increment (-2:-9) (signed)
$9F2A FX_X_INCR_H
(DCSEL=3)
(Write only)
X Incr. 32x X Increment (5:0) (signed) X Incr. (-1)
$9F2B FX_Y_INCR_L
(DCSEL=3)
(Write only)
Y/X2 Increment (-2:-9) (signed)
$9F2C FX_Y_INCR_H
(DCSEL=3)
(Write only)
Y/X2 Incr. 32x Y/X2 Increment (5:0) (signed) Y/X2 Incr. (-1)

32-bit cache

When the CPU reads a byte via DATA0 or DATA1, and “cache fill enable” is set, the value read will be copied into an indexed location inside the 32-bit cache.

Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
$9F29 FX_CTRL
(DCSEL=2)
Transp. Writes Cache Write Enable Cache Fill Enable
One-byte Cache Cycling 16-bit Hop 4-bit Mode Addr1 Mode

In 8-bit mode, a byte is cached, but in 4-bit mode, a nibble is cached instead. Afterwards, by default, the index into the cache is incremented, and loops back around to 0 after the last index. The index can be set explicitly via the FX_MULT register. 8-bit mode uses bits 3:2 and ranges from 0-3. 4-bit mode uses bits 3:1 and ranges from 0-7.

Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
$9F2C FX_MULT
(DCSEL=2)
(Write only)
Reset Accum. Accumulate Subtract Enable Multiplier Enable Cache Byte Index
Cache Nibble Index
Two-byte Cache Incr. Mode

Alternatively, the cache index can cycle between two adjacent bytes: 0, 1, and back to 0; or 2, 3, and back to 2. This option only has effect in 8-bit mode.

Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
$9F2C FX_MULT
(DCSEL=2)
(Write only)
Reset Accum. Accumulate Subtract Enable Multiplier Enable Cache Byte Index Cache Nibble Index Two-byte Cache Incr. Mode

Setting the cache data directly

Instead of filling the cache by reading from DATA0 or DATA1, the cache data can also be set directly by writing to the FX_CACHE* registers. Setting the cache directly does not affect the cache index.

Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
$9F29 FX_CACHE_L
(DCSEL=6)
(Write only)
Cache (7:0) | Multiplicand (7:0) (signed)
$9F2A FX_CACHE_M
(DCSEL=6)
(Write only)
Cache (15:8) | Multiplicand (15:8) (signed)
$9F2B FX_CACHE_H
(DCSEL=6)
(Write only)
Cache (23:16) | Multiplier (7:0) (signed)
$9F2C FX_CACHE_U
(DCSEL=6)
(Write only)
Cache (31:24) | Multiplier (15:8) (signed)

Writing the cache to VRAM

If “Cache write enabled” is set, the cache contents are written to VRAM when writing to DATA0 or DATA1. The primary use is to write all or part of the 32-bit cache to the 4-byte-aligned region of memory at the current address.

Control over which parts are written are chosen by the value written to DATA0 or DATA1. The value written is treated as a nibble mask where a 0-bit writes the data and a 1-bit masks the data from being written.In other words, writing a 0 will flush the entire 32-bit cache. Writing #%00001111 will write the second and third byte in the cache to VRAM in the second and third memory locations in the 4-byte-aligned region.

Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
$9F29 FX_CTRL
(DCSEL=2)
Transp. Writes Cache Write Enable
Cache Fill Enable One-byte Cache Cycling 16-bit Hop 4-bit Mode Addr1 Mode

Transparency writes

Transparent writes, when enabled, also applies to cache writes. If enabled, zero bytes (or zero nibbles in 4-bit mode) in the cache, which are treated as transparency pixels, are not written.

Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
$9F29 FX_CTRL
(DCSEL=2)
Transp. Writes
Cache Write Enable Cache Fill Enable One-byte Cache Cycling 16-bit Hop 4-bit Mode Addr1 Mode

When “one-byte cache cycling” is turned on and DATA0 or DATA1 is written to, the byte at the current cache index is written to VRAM. When “Cache write enable” is set as well, the byte is duplicated 4 times when writing to VRAM.

Usually the incrementing of the cache index is only triggered by reading from DATA0 or DATA1 when cache filling is enabled. However it can also be triggered by reading from DATA0 in polygon mode when cache filling is not enabled and “one-byte cache cycling” is enabled.

Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
$9F29 FX_CTRL
(DCSEL=2)
Transp. Writes Cache Write Enable Cache Fill Enable One-byte Cache Cycling
16-bit Hop 4-bit Mode Addr1 Mode

Multiplier and accumulator

The 32-bit cache also doubles as an input to the hardware multiplier when Multiplier Enable is set.

Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
$9F2C FX_MULT
(DCSEL=2)
(Write only)
Reset Accum. Accumulate Subtract Enable Multiplier Enable
Cache Byte Index Cache Nibble Index Two-byte Cache Incr. Mode

To do a single multiplication, put the two 16-bit inputs into the two halves of the 32-bit cache.

    lda #(2 << 1)
    sta VERA_CTRL        ; $9F25
    stz VERA_FX_CTRL     ; $9F29 (mainly to reset Addr1 Mode to 0)
    lda #%00010000
    sta VERA_FX_MULT     ; $9F2C
    lda #(6 << 1)
    sta VERA_CTRL        ; $9F25
    lda #<69
    sta VERA_FX_CACHE_L  ; $9F29
    lda #>69
    sta VERA_FX_CACHE_M  ; $9F2A
    lda #<420
    sta VERA_FX_CACHE_H  ; $9F2B
    lda #>420
    sta VERA_FX_CACHE_U  ; $9F2C

The accumulator can be used to accumulate the sum of several multiplications. Before doing this single multiplication, ensure this is reset this to zero, otherwise the output will be added to the value of the accumulator before being written. There are two methods to do this. The first is to write a 1 into bit 7 of FX_MULT ($9F2C, DCSEL=2). The other, more conveniently, is to read FX_ACCUM_RESET (the same register location as VERA_FX_CACHE_L).

    lda FX_ACCUM_RESET   ; $9F29 (DCSEL=6)

To perform the multiplication, it must be written to VRAM first. This is done via the cache write mechanism. Usually the cache itself is written to VRAM if “Cache Write Enable” is set. However, if the “Multiplier Enable” bit is also enabled, the multiplier result is written to VRAM instead.

    ; Set the ADDR0 pointer to $00000 and write our multiplication result there
    lda #(2 << 1)
    sta VERA_CTRL        ; $9F25
    lda #%01000000       ; Cache Write Enable
    sta VERA_FX_CTRL     ; $9F29
    stz VERA_ADDRx_L     ; $9F20 (ADDR0)
    stz VERA_ADDRx_M     ; $9F21
    stz VERA_ADDRx_H     ; $9F22 ; no increment
    stz VERA_DATA0       ; $9F23 ; multiply and write out result
    lda #%00010000       ; Increment 1
    sta VERA_ADDRx_H     ; $9F22 ; so we can read out the result
    lda VERA_DATA0
    sta $0400
    lda VERA_DATA0
    sta $0401
    lda VERA_DATA0
    sta $0402
    lda VERA_DATA0
    sta $0403

Note: the VERA works by pre-fetching the contents from VRAM whenever the address pointer is changed or incremented. This happens even when the address increment is 0. Due to this behavior, it is possible to have stale data latched in one of the two data ports if the underlying VRAM is changed via the other data port. This example avoids this scenario by only using ADDR0/DATA0. This potential gotcha was not introduced by the FX update, but rather has always been how VERA behaves.

Accumulation

One can also trigger the multiplication and add it to (or subtract it from) the multiplication accumulator by calling “accumulate” in one of two different ways. We could write a 1 into bit 6 of FX_MULT ($9F2C, DCSEL=2), but more conveniently, we can read FX_ACCUM (the same register location as VERA_FX_CACHE_M)

    lda FX_ACCUM         ; $9F2A (DCSEL=6)

Once the accumulation is triggered, the result of the operation is stored back into the accumulator.

The default accumulation operation is (multiply then) add. This can be switched to subtraction by setting the Subtract Enable bit in FX_MULT

Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
$9F2C FX_MULT
(DCSEL=2)
(Write only)
Reset Accum. Accumulate Subtract Enable
Multiplier Enable Cache Byte Index Cache Nibble Index Two-byte Cache Incr. Mode

If the multiplication accumulator has a nonzero value, any multiplications carried out via a VRAM Cache write will be offset by the value of the accumulator (either added to or subtracted from the accumulator), but they will not change the value of the accumulator.

16-bit hop

There is a special address increment mode that can be used to read pairs of bytes via ADDR1.

Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
$9F29 FX_CTRL
(DCSEL=2)
Transp. Writes Cache Write Enable Cache Fill Enable One-byte Cache Cycling 16-bit Hop
4-bit Mode Addr1 Mode

In this mode, setting ADDR1’s increment to +4 will result in alternating increments of +1 and +3. Setting it to +320 will result in alternating increments of +1 and +319. All other increment values, including negative increments, lack this special hop property.

After this bit is set, writing to ADDRx_L resets the hop alignment such that the first increment is +1.

This mode is useful for reading out a series of 16-bit values after a series of multiplications.

For a more detailed explanation of chained math operations, see the tutorial.