Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

A 16-bit ISA for an Effects-Native Toy Machine — v0.4

Status

This is v0.4 of the ISA. It supersedes v0.3.

Major changes from v0.3 -> v0.4

  • Several new operations (LDF, STF, ALLOCF, ADJSP, TCALL)
  • Handler frames carry installer’s v6
  • Continuation reclamation
  • CONT_REC memory discontinuity

Major changes from v0.2.1 -> v0.3:

  • Bit operations added (AND, OR, XOR, SHL, SHR)
  • Typed pointer arithmetic added (INC, DEC)
  • Multiple-install disambiguation specified (install-ID mechanism)
  • Continuation in-memory layout pinned (no longer implementation-defined)
  • Handler CAM specified as LRU write-back cache (concrete cost model)
  • Register-class conventions ratified (no callee-saved registers)

Decisions deferred to a future revision are flagged at the end.

1. Design parameters

Instruction width16 bits, fixed
Data word width16 bits
Address space64 KB word-addressed
Registers8 × (3-bit tag, 16-bit value) = 19-bit slots
Handler CAM8 entries, LRU write-back to memory
Effect table256 entries (16 families × 16 ops), runtime-modifiable
HeapWord-addressed, with per-object headers
Data stackFor CALL/RET; separate from heap

2. Tag namespace

3-bit tags, fixed assignment:

TagNameMeaning
0INT16-bit signed integer
1PTRPointer to a heap object (with header at offset -1)
2CONTContinuation handle (heap pointer, distinguished for GC and for RESU typechecking)
3CLOSUREClosure record (heap pointer, distinguished for CALL typechecking)
4UNITThe unique unit value; value bits ignored
5BOOL1 or 0 in value bits
6RESERVEDFuture primitive
7RESERVEDFuture primitive

All user-defined sum types are heap-allocated (PTR-tagged); variant discriminator lives in the object’s header.

3. Effect namespace

Effects are addressed as (family : 4 bits, op : 4 bits), giving 256 distinct operations.

The effect table is a 256-entry table in memory at a known fixed address, mapping (family, op) to handler-installation-info.

Family 0 is reserved (privileged control plane). Operations:

(family=0, op=N)Operation
(0, 0)Install handler for (family, op) at PC offset
(0, 1)Uninstall handler for (family, op)
(0, 2)Save effect table state
(0, 3)Restore effect table state
(0, 4)Query if (family, op) is handled
(0, 5..15)Reserved

Family 0xF is reserved (hardware-originated). NoC packet arrivals, timer ticks, device events.

Families 1–0xE are available to user programs.

4. Handler CAM

The CAM is an LRU-managed write-back cache for the handler stack.

OperationCost (cycles)
CAM hit1
CAM miss, spill-entry inspection12 per entry (4 words × 3 cycles per word)

INST writes to the MRU slot. If full, the LRU entry is written back to a spill region in memory. PERF searches CAM first, then walks the spill on miss. A spill hit promotes the entry to CAM MRU.

Multiple-install disambiguation:

Each handler frame carries a monotonic install-ID, assigned at INST time. When PERF finds multiple matches for the same (family, op) — whether in CAM, in spill, or split across both — the match with the highest install-ID wins.

This delivers correct effect-shadowing semantics: nested handlers shadow outer handlers regardless of cache state. The CAM is purely a performance optimization; correctness depends only on install-ID ordering.

Implementation cost: one 16-bit field per frame. Lookup cost: in the common case (one match), no change; in the multi-match case, walk the small candidate set and pick max(install_id).

UNIN semantics: Uninstalls the most-recently-installed handler for (family, op). If the CAM contains it, remove and refill from spill if available. If only in spill, scan and remove. Re-establishes the previous shadowing.

v6 behavior: Each HandlerFrame (both in CAM and spill) now stores the installer’s v6 tag and value, captured at INST time. PERF and any device-injected effect restore v6 to the captured installer’s frame before transferring control to the handler’s PC.

Why this matters: handler clause bodies are compiled inside the installer’s lexical scope and reference frame slots relative to the installer’s v6. The performer’s v6 is the wrong base; without this restoration, handlers that wrote to frame slots via v6 were corrupting the performer’s frame.

5. Heap object layout

Per-object header at offset -1 from each PTR/CONT/CLOSURE-tagged value:

  15        8 7      4 3      0
 ┌──────────┬────────┬────────┐
 │variant(8)│size(4) │layout(4)│
 └──────────┴────────┴────────┘
  • layout (4 bits): RECORD, ARRAY, CLOSURE_REC, CONT_REC, SUM, STRING, others reserved.
  • size (4 bits): Number of payload words after the header. 15 is the escape value meaning “consult next word for actual size.”
  • variant (8 bits): For SUM layout, the variant discriminator. For CONT_REC, the used/unused flag. Repurposed per layout otherwise.

ALLOC initializes header from immediates. SETV rewrites only the variant byte.

6. Continuation in-memory layout

A CONT_REC heap object has the following payload after its header:

Word  Field                  Notes
----  ---------------------  --------
  0   saved PC               16-bit code address
  1   saved SP               stack pointer at capture time
  2   prompt SP              stack pointer of handler frame (delimits captured slice)
  3   saved v6 value         (16 bits)
  4   saved v6 tag           (3 bits + reserved)
  5   saved v0 value
  6   saved v0 tag
  7   saved v1 value
  8   saved v1 tag
  9   saved v2 value
 10   saved v2 tag
 11   saved v3 value
 12   saved v3 tag
 13   saved v4 value
 14   saved v4 tag
 15   saved v5 value
 16   saved v5 tag
 17   saved v7 value
 18   saved v7 tag

Word Field Notes


0 actual size = 19 size-escape word; size field in header is 15 1 saved PC 16-bit code address 2 saved SP stack pointer at capture time 3 prompt SP stack pointer of handler frame (delimits captured slice) 4 saved v6 value (16 bits) 5 saved v6 tag (3 bits + reserved) 6 saved v0 value 7 saved v0 tag 8 saved v1 value 9 saved v1 tag 10 saved v2 value 11 saved v2 tag 12 saved v3 value 13 saved v3 tag 14 saved v4 value 15 saved v4 tag 16 saved v5 value 17 saved v5 tag 18 saved v7 value 19 saved v7 tag

20 payload words including the size-escape word“ or “19 saved-state words plus one size-escape word. The header’s 4-bit size field is set to 15 (the escape value); the first payload word holds the actual size (19 or 20 depending on count convention)

Two-word-per-register saves are chosen to ease future expansion to wider data words. Permitted implementations may use packed encoding if they remain semantically equivalent to the spec’d layout — but the layout above is canonical for tooling (debuggers, migration, serialization).

The header’s variant byte holds a used/unused flag. RESU sets used = 1 atomically (with respect to preemption boundaries); a second RESU on a used CONT traps.

The CONT_REC region is disjoint from the general heap; ALLOC/ALLOCF will not allocate into it.

7. Instruction encoding

Five forms:

                  15 14 13 12 11  10 9 8  7 6 5  4 3 2  1 0
Form R3:          ┌─────────────┬───────┬──────┬──────┬─────┐
                  │  opcode (5) │vd (3) │vs (3)│vt (3)│ ─── │
Form RR5:         ├─────────────┼───────┼──────┼──────┴─────┤
                  │  opcode (5) │vd (3) │vs (3)│  imm5      │
Form RI8:         ├─────────────┼───────┼──────┴────────────┤
                  │  opcode (5) │vd (3) │      imm8         │
Form J11:         ├─────────────┴───────┴───────────────────┤
                  │  opcode (5) │           imm11           │
Form E:           ├─────────────┬─────────┬─────────┬───────┤
                  │  opcode (5) │family(4)│ op (4)  │hpc(3) │ (for INST)
                  ├─────────────┼─────────┼─────────┼───────┤
                  │  opcode (5) │family(4)│ op (4)  │vr (3) │ (for PERF, UNIN)
                  └─────────────┴─────────┴─────────┴───────┘

8. Instruction set

OpMnemonicFormSemantics
0x00LI vd, imm8RI8vd ← (INT, sign-ext(imm8))
0x01LIH vd, imm8RI8vd.value ← (vd.value & 0xFF) | (imm8 << 8); tag preserved
0x02ADD vd, vs, vtR3vd ← (INT, vs.value + vt.value); both operands must be INT
0x03SUB vd, vs, vtR3vd ← (INT, vs.value - vt.value); both operands must be INT
0x04BZ vs, off8RI8if vs.value == 0 then pc ← pc + sign-ext(off8)
0x05JMP off11J11pc ← pc + sign-ext(off11)
0x06LD vd, vs, off5RR5vd ← (INT, mem[vs.value + off5]); vs must be PTR, CONT, or CLOSURE; caller must SETTAG if non-INT result is intended
0x07ST vd, vs, off5RR5mem[vs.value + off5] ← vd.value; vs must be PTR; tag of vd discarded
0x08ALLOC vd, layout4, n4(custom)bump-allocate (n+1) words including header; vd ← (PTR, hp+1); hp += n+1; size field 15 is the escape value (consult next word for actual size)
0x09MOV vd, vsR3vd ← vs (copies both tag and value)
0x0ACALL vf, vaR3vf must be CLOSURE; push 3 words (return PC, v6.value, v6.tag) onto data stack; v6 ← vf; v0 ← va; jump to mem[vf.value] (code pointer at offset 0 of closure record)
0x0BRET vaRI8pop 3 words from data stack (saved v6.tag, saved v6.value, return PC); restore v6; v0 ← va; jump to return PC
0x0CMATCH vd, vs, tbl5RR5dispatch on vs (see §9); jump table follows at pc + tbl5
0x0DINST family4, op4, hpc3Einstall handler at pc + sign-ext(hpc3); assign monotonic install-ID; capture installer’s v6 (tag and value) into the handler frame
0x0EUNIN family4, op4Euninstall most-recently-installed handler for (family, op); if CAM-resident, remove and refill from spill if available; if only in spill, scan and remove
0x0FPERF family4, op4, vrEperform effect with argument in vr; dispatch and continuation capture per §10
0x10RESU vk, vvR3vk must be CONT-tagged with “unused” variant byte; restore saved register state and SP from the CONT_REC; v0 ← vv; mark CONT_REC used; return its slot to the cont free pool; jump to the saved PC
0x11SETV vd, variant8RI8set variant byte of heap header at mem[vd.value - 1]; vd must be PTR
0x12SETTAG vd, tag3(custom)assert tag on vd; debug build: trap on mismatch; release build: rewrite tag without check
0x13HALTJ11stop the core until a remote PERF arrives (idle state)
0x14AND vd, vs, vtR3vd ← (INT, vs.value & vt.value); both operands must be INT
0x15OR vd, vs, vtR3vd ← (INT, vs.value | vt.value); both operands must be INT
0x16XOR vd, vs, vtR3vd ← (INT, vs.value ^ vt.value); both operands must be INT
0x17SHL vd, vs, vtR3vd ← (INT, vs.value << (vt.value & 0xF)); both operands must be INT; shift count masked to 4 bits
0x18SHR vd, vs, vtR3vd ← (INT, vs.value >> (vt.value & 0xF)) (logical); both operands must be INT; shift count masked to 4 bits
0x19INC vd, vs, imm4(custom)vd ← (PTR, vs.value + imm4); vs must be PTR; result is PTR
0x1ADEC vd, vs, imm4(custom)vd ← (PTR, vs.value - imm4); vs must be PTR; result is PTR
0x1BLDF vd, imm8RI8vd ← (INT, mem[v6.value + imm8]); v6 must be PTR; caller SETTAGs if non-INT result intended. Wide-immediate frame-local load
0x1CSTF vd, imm8RI8mem[v6.value + imm8] ← vd.value; v6 must be PTR; tag of vd discarded. Wide-immediate frame-local store
0x1DALLOCF imm11J11v6 ← (PTR, bump-allocate(layout=RECORD, n=imm11)); result lands in v6 implicitly. Wide-immediate heap allocation for frames exceeding ALLOC’s 4-bit n
0x1EADJSP imm11J11sp ← sp + sign-ext(imm11); v6 ← (PTR, sp). Signed 11-bit displacement in words; used for stack-frame prologue/epilogue. RET is unchanged, so the epilogue must restore sp to its pre-prologue value before RET
0x1FTCALL vf, vaR3vf must be CLOSURE; v6 ← (PTR, mem[vf.value + 1]); v0 ← va; pc ← mem[vf.value]. No push of return record — the callee’s eventual RET pops the caller’s caller’s record, splicing the current frame out of the call chain. Used for indirect tail-call elimination

9. MATCH semantics

MATCH vd, vs, tbl5:

  • If vs.tag is PTR: load the header word from mem[vs.value - 1], extract the variant byte (bits [15:8]), and use it as the index into the jump table at pc + tbl5.
  • If vs.tag is a primitive (INT, BOOL, UNIT, etc.): use the tag value itself as the index.

vd is reserved for future use (a “scratch” output register for the loaded header word, if useful).

The jump table is a sequence of 16-bit code-relative offsets immediately following the MATCH instruction, at pc + tbl5. The CPU adds the selected offset to pc (post-MATCH) and jumps.

Tables longer than the variant range are permitted; entries beyond the actual variant count are unreachable. Missing entries trap.

10. PERF semantics

PERF family, op, vr:

  1. CAM search. Look for an entry matching (family, op). If multiple, pick the one with the highest install-ID.
  2. Spill search if CAM misses. Walk the spill region for matches; among CAM and spill matches, pick highest install-ID.
  3. Effect table fallback. If no installed handler is found, consult effect_table[(family, op)]:
    • Local(pc): behave as if a CAM entry pointed at this PC.
    • Remote(dst, dev_op): build a NoC packet and emit; do NOT capture a continuation; continue at next instruction.
    • Unhandled: trap.

On local dispatch (CAM hit, spill hit, or effect table Local):

  1. Allocate a CONT_REC on the heap; capture saved registers (v0-v5, v6, v7 — values and tags), PC (of next instruction), SP, and the prompt SP (the SP at the handler frame)..
  2. Set v0 ← vr (effect argument).
  3. Set v1 ← (CONT, ptr_to_CONT_REC) (continuation handle).
  4. Set v7 ← (INT, op) (the 4-bit op field of the effect, sign- or zero-extended).
  5. Restore v6 from the matched handler frame
  6. Jump to the handler’s PC.

On remote dispatch:

  1. Build NoC packet with dst = device_addr, src = this_core, op = dev_op, payload = vr.value (zero-extended).
  2. Hand to NoC interface; stall if FIFO full.
  3. Continue at the next instruction.

Clobber notice: v0, v1, and v7 are unconditionally clobbered by every PERF that does local dispatch. The compiler treats them as not-live across perform sites.

CONT_REC Reclamation: PERF and RESU form a place/take discipline.

  • PERF allocates a fresh CONT_REC at the next available cont slot, or reuses a previously-freed slot from a free list.
  • RESU consumes the cont (one-shot, runtime-checked) and reclaims the slot. The reclamation path is hardware-internal — no visible bus traffic to the kernel.

Allocation policy in the current VM: free-list-managed. The committed spec permits three implementation variants for future hardware, all of which produce identical observable behaviour for kernel code:

  • (a) Fixed-size on-chip ring buffer with a hard cap on simultaneously-live continuations.
  • (b) Cont-CAM with spill to a software-managed pool. Mirrors the handler-CAM / spill structure from §4.
  • (c) Pure software-managed cont area.

Kernel code is portable across all three; it only ever emits PERF/RESU and never names cont slot addresses or reclamation operations.

11. RESU semantics

RESU vk, vv:

  1. vk must be CONT-tagged. vk’s header variant byte must indicate “unused”; else trap.
  2. Mark the CONT_REC as used (atomic with respect to preemption boundaries).
  3. Restore registers v0..v5, v7, and v6 from the CONT_REC’s saved slots.
  4. Restore SP to saved SP.
  5. Set v0 ← vv (the value being delivered to the perform site).
  6. Jump to the saved PC.

The handler frame issuing the RESU is destroyed; control flow transfers entirely to the resumed continuation.

CONT_REC Reclamation: RESU consumes a cont and reclaims the slot. See full description in previous section.

12. Frame Semantics

Load Frame

LDF vd, imm8: vd ← (INT, mem[v6.value + imm8])

Form RI8. The caller is responsible for tagging the result with SETTAG if the loaded value should not be INT-tagged.

Why this exists: the narrow LD vd, vs, off5 has a 5-bit offset, limiting frame slot access to slots 0..31. Kernel functions need larger frames. Rather than rework the general encoding, LDF adds an 8-bit immediate fixed to v6 (the frame-pointer / closure register), giving 256-slot capacity for frame-local access.

Store Frame

STF vd, imm8: mem[v6.value + imm8] ← vd.value

Form RI8. Tag of vd is discarded (matches existing ST).

Allocate Frame

ALLOCF imm11: v6 ← (PTR, alloc(layout=0, n=imm11))

Form J11 (11-bit operand). Allocates from the heap; tags result PTR; result lands in v6 implicitly. The narrow ALLOC has a 4-bit n field (max 15 words including header); ALLOCF raises the cap to 2048 words. layout is hardcoded to 0 (RECORD); use narrow ALLOC when a different layout is needed.

The codegen’s emitter automatically rewrites narrow → wide when the constants exceed narrow range.

Adjust Stack Pointer

ADJSP imm11 (signed): @sp ← @sp + imm; v6 ← (PTR, @sp)

Form J11. Signed 11-bit immediate (−1024..+1023 words). The data stack pointer is adjusted; v6 is set to point at the new top.

Crucial discipline: RET is unchanged. It still pops 3 words (return PC + v6.value + v6.tag) from the data stack. A stack-framed function’s epilogue must restore @sp to its pre-prologue value via a matching positive ADJSP before RET runs, or the return record is read from the wrong location.

Implements: the hybrid frame-allocation policy promised by the book’s design but unspecified at the ISA level. Functions whose frames don’t need to outlive the call (i.e., non-performing) use ADJSP instead of ALLOC/ALLOCF.

13. Calling conventions

Function call

CALL vf, va:

  • vf must be CLOSURE-tagged; points to a closure record (code_ptr, env_word_0, env_word_1, ...)
  • Hardware pushes return PC (1 word), v6.value (1 word), v6.tag (1 word, packed in low 3 bits) to data stack — 3 words per call
  • v6 ← vf
  • v0 ← va
  • Jump to code_ptr (word 1 of closure record)

Tail Call

TCALL vf, va: v6 ← (PTR, mem[vf+1])
              v0 ← va
              pc ← mem[vf]
              -- NO push of return PC + v6 saved

Form R3, same field layout as CALL (0x0A). Semantically: CALL without the return record push. The callee’s eventual RET pops the caller’s caller’s record, splicing the caller out of the call chain.

Required for indirect tail-call elimination through closure-typed values; the kernel’s invoke wrapper compiles to this when in tail position.

Function return

RET va:

  • Pop 3 words from data stack: saved v6 tag, saved v6 value, return PC
  • Restore v6
  • v0 ← va
  • Jump to return PC

Register classes

RegisterClassNotes
v0arg / return / PERF argclobbered by every CALL and every PERF
v1scratch / PERF continuationclobbered by every PERF
v2..v5caller-saved scratchcallee may freely clobber
v6$clo (closure environment)preserved across CALL via implicit stack save
v7PERF op-tagclobbered by every PERF

No callee-saved registers. Function prologues do not preserve registers; the cost of preservation moves to callers and is paid only where needed.

Handler entry

When PERF dispatches locally:

  • v0 ← effect argument
  • v1 ← continuation handle (CONT tag)
  • v7 ← op tag (INT)
  • v2..v5 ← undefined; handler must not assume any value
  • v6 ← undefined; handler’s closure (if any) is recorded in the CONT_REC and restored only on RESU

14. NoC packet format

(Recorded here for completeness; details may evolve as NoC implementation matures.)

[63:58] dst_id     (6 bits, 64 devices)
[57:52] src_id     (6 bits)
[51:48] op         (4 bits)
[47:44] flags      (4 bits, priority/control)
[43:12] payload    (32 bits)
[11:0]  reserved

PERF with remote dispatch builds packets with op = dev_op (from the effect table entry) and payload = vr.value (zero-extended from 16 bits). For larger payloads, the compiler emits a pointer-passing version (allocate, store, perform with pointer).

15. Boot semantics

(Unchanged from v0.2.1.) The bootstrap core comes up at a reset vector with empty CAM and empty effect table. Boot ROM populates the effect table, installs initial handlers, enumerates devices via PERF probes, and releases other cores.

16. Preemption semantics

(Unchanged from v0.2.1.) Preemption is sampled at the end of each sequencer cycle (Fetch→Decode→Execute→Write→Preempt). Instructions either retire fully or do not retire. Pending remote PERFs queue at the core’s NoC input; the queue is drained at preemption sample points.

HALT halts the core until any remote PERF arrives (idle state).

17. What’s deferred to v0.5

The following items from the compiler team’s v0.2.1 notes are deferred:

  • Memory operation timing model. LD/ST/ALLOC currently 1 cycle, 0 abstract time. A coordinated timing pass after the first real workloads.
  • Fused tag-aware loads (LD.PTR, LD.CONT, LD.CLOSURE). Defer until profiling shows the two-instruction pattern in a hot path.
  • Multi-shot continuations. First-shot are one-shot only.
  • Arithmetic shift right (SAR). Add when a workload demonstrates the need.
  • Shift-by-immediate (SHLI, SHRI). Peephole optimization in assembler should fuse LI + SHR patterns; promote to ISA only if peephole proves insufficient.
  • Cache hierarchy modeling. Flat memory cost for now.
  • Atomic operations (CAS). Needed for multicore synchronization; not yet needed in single-core kernel. Add as one of the reserved slots when multicore begins.
  • Discard-without-resume. A handler clause that captures k but never resumes (e.g. ProcLifecycle.exit (), _k -> …) leaks the cont’s slot under the current VM’s reclamation discipline. This must be addressed.

18. Open questions

  • Custom encoding details for ALLOC, SETTAG, INC, DEC need ratification from the emulator team. The fields are specified; the exact bit positions within the 11-bit operand region need to be agreed.
  • Behavior on tag-check trap. Currently unspecified what happens when ADD encounters a non-INT, or RESU encounters non-CONT, or similar. Proposal: tag-check faults raise a hardware-originated PERF on family 0xF op 0 with a fault-code payload. Kernel installs a handler that decides whether to terminate the offending process or attempt recovery.
  • CONT_REC variant byte format. Currently spec’d as a used/unused flag, but 8 bits is more than needed for that. Possible future use: epoch counter for ABA-style protection on continuation reuse in long-running systems.

19. Versioning

This document is ISA spec v0.4.