A 16-bit ISA for an Effects-Native Toy Machine — v0.4
Status
This is v0.4 of the ISA. It supersedes v0.3.
Major changes from v0.3 -> v0.4
- Several new operations (
LDF,STF,ALLOCF,ADJSP,TCALL) - Handler frames carry installer’s v6
- Continuation reclamation
- CONT_REC memory discontinuity
Major changes from v0.2.1 -> v0.3:
- Bit operations added (
AND,OR,XOR,SHL,SHR) - Typed pointer arithmetic added (
INC,DEC) - Multiple-install disambiguation specified (install-ID mechanism)
- Continuation in-memory layout pinned (no longer implementation-defined)
- Handler CAM specified as LRU write-back cache (concrete cost model)
- Register-class conventions ratified (no callee-saved registers)
Decisions deferred to a future revision are flagged at the end.
1. Design parameters
| Instruction width | 16 bits, fixed |
| Data word width | 16 bits |
| Address space | 64 KB word-addressed |
| Registers | 8 × (3-bit tag, 16-bit value) = 19-bit slots |
| Handler CAM | 8 entries, LRU write-back to memory |
| Effect table | 256 entries (16 families × 16 ops), runtime-modifiable |
| Heap | Word-addressed, with per-object headers |
| Data stack | For CALL/RET; separate from heap |
2. Tag namespace
3-bit tags, fixed assignment:
| Tag | Name | Meaning |
|---|---|---|
| 0 | INT | 16-bit signed integer |
| 1 | PTR | Pointer to a heap object (with header at offset -1) |
| 2 | CONT | Continuation handle (heap pointer, distinguished for GC and for RESU typechecking) |
| 3 | CLOSURE | Closure record (heap pointer, distinguished for CALL typechecking) |
| 4 | UNIT | The unique unit value; value bits ignored |
| 5 | BOOL | 1 or 0 in value bits |
| 6 | RESERVED | Future primitive |
| 7 | RESERVED | Future primitive |
All user-defined sum types are heap-allocated (PTR-tagged); variant discriminator lives in the object’s header.
3. Effect namespace
Effects are addressed as (family : 4 bits, op : 4 bits), giving 256 distinct operations.
The effect table is a 256-entry table in memory at a known fixed address, mapping (family, op) to handler-installation-info.
Family 0 is reserved (privileged control plane). Operations:
| (family=0, op=N) | Operation |
|---|---|
| (0, 0) | Install handler for (family, op) at PC offset |
| (0, 1) | Uninstall handler for (family, op) |
| (0, 2) | Save effect table state |
| (0, 3) | Restore effect table state |
| (0, 4) | Query if (family, op) is handled |
| (0, 5..15) | Reserved |
Family 0xF is reserved (hardware-originated). NoC packet arrivals, timer ticks, device events.
Families 1–0xE are available to user programs.
4. Handler CAM
The CAM is an LRU-managed write-back cache for the handler stack.
| Operation | Cost (cycles) |
|---|---|
| CAM hit | 1 |
| CAM miss, spill-entry inspection | 12 per entry (4 words × 3 cycles per word) |
INST writes to the MRU slot. If full, the LRU entry is written back to a spill region in memory. PERF searches CAM first, then walks the spill on miss. A spill hit promotes the entry to CAM MRU.
Multiple-install disambiguation:
Each handler frame carries a monotonic install-ID, assigned at INST time. When PERF finds multiple matches for the same (family, op) — whether in CAM, in spill, or split across both — the match with the highest install-ID wins.
This delivers correct effect-shadowing semantics: nested handlers shadow outer handlers regardless of cache state. The CAM is purely a performance optimization; correctness depends only on install-ID ordering.
Implementation cost: one 16-bit field per frame. Lookup cost: in the common case (one match), no change; in the multi-match case, walk the small candidate set and pick max(install_id).
UNIN semantics: Uninstalls the most-recently-installed handler for (family, op). If the CAM contains it, remove and refill from spill if available. If only in spill, scan and remove. Re-establishes the previous shadowing.
v6 behavior:
Each HandlerFrame (both in CAM and spill) now stores
the installer’s v6 tag and value, captured at INST time. PERF
and any device-injected effect restore v6 to the captured installer’s
frame before transferring control to the handler’s PC.
Why this matters: handler clause bodies are compiled inside the
installer’s lexical scope and reference frame slots relative to the
installer’s v6. The performer’s v6 is the wrong base; without
this restoration, handlers that wrote to frame slots via v6 were
corrupting the performer’s frame.
5. Heap object layout
Per-object header at offset -1 from each PTR/CONT/CLOSURE-tagged value:
15 8 7 4 3 0
┌──────────┬────────┬────────┐
│variant(8)│size(4) │layout(4)│
└──────────┴────────┴────────┘
- layout (4 bits):
RECORD,ARRAY,CLOSURE_REC,CONT_REC,SUM,STRING, others reserved. - size (4 bits): Number of payload words after the header.
15is the escape value meaning “consult next word for actual size.” - variant (8 bits): For
SUMlayout, the variant discriminator. ForCONT_REC, the used/unused flag. Repurposed per layout otherwise.
ALLOC initializes header from immediates. SETV rewrites only the variant byte.
6. Continuation in-memory layout
A CONT_REC heap object has the following payload after its header:
Word Field Notes
---- --------------------- --------
0 saved PC 16-bit code address
1 saved SP stack pointer at capture time
2 prompt SP stack pointer of handler frame (delimits captured slice)
3 saved v6 value (16 bits)
4 saved v6 tag (3 bits + reserved)
5 saved v0 value
6 saved v0 tag
7 saved v1 value
8 saved v1 tag
9 saved v2 value
10 saved v2 tag
11 saved v3 value
12 saved v3 tag
13 saved v4 value
14 saved v4 tag
15 saved v5 value
16 saved v5 tag
17 saved v7 value
18 saved v7 tag
Word Field Notes
0 actual size = 19 size-escape word; size field in header is 15 1 saved PC 16-bit code address 2 saved SP stack pointer at capture time 3 prompt SP stack pointer of handler frame (delimits captured slice) 4 saved v6 value (16 bits) 5 saved v6 tag (3 bits + reserved) 6 saved v0 value 7 saved v0 tag 8 saved v1 value 9 saved v1 tag 10 saved v2 value 11 saved v2 tag 12 saved v3 value 13 saved v3 tag 14 saved v4 value 15 saved v4 tag 16 saved v5 value 17 saved v5 tag 18 saved v7 value 19 saved v7 tag
20 payload words including the size-escape word“ or “19 saved-state words plus one size-escape word. The header’s 4-bit size field is set to 15 (the escape value); the first payload word holds the actual size (19 or 20 depending on count convention)
Two-word-per-register saves are chosen to ease future expansion to wider data words. Permitted implementations may use packed encoding if they remain semantically equivalent to the spec’d layout — but the layout above is canonical for tooling (debuggers, migration, serialization).
The header’s variant byte holds a used/unused flag. RESU sets used = 1 atomically (with respect to preemption boundaries); a second RESU on a used CONT traps.
The CONT_REC region is disjoint from the general heap; ALLOC/ALLOCF will not allocate into it.
7. Instruction encoding
Five forms:
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Form R3: ┌─────────────┬───────┬──────┬──────┬─────┐
│ opcode (5) │vd (3) │vs (3)│vt (3)│ ─── │
Form RR5: ├─────────────┼───────┼──────┼──────┴─────┤
│ opcode (5) │vd (3) │vs (3)│ imm5 │
Form RI8: ├─────────────┼───────┼──────┴────────────┤
│ opcode (5) │vd (3) │ imm8 │
Form J11: ├─────────────┴───────┴───────────────────┤
│ opcode (5) │ imm11 │
Form E: ├─────────────┬─────────┬─────────┬───────┤
│ opcode (5) │family(4)│ op (4) │hpc(3) │ (for INST)
├─────────────┼─────────┼─────────┼───────┤
│ opcode (5) │family(4)│ op (4) │vr (3) │ (for PERF, UNIN)
└─────────────┴─────────┴─────────┴───────┘
8. Instruction set
| Op | Mnemonic | Form | Semantics |
|---|---|---|---|
| 0x00 | LI vd, imm8 | RI8 | vd ← (INT, sign-ext(imm8)) |
| 0x01 | LIH vd, imm8 | RI8 | vd.value ← (vd.value & 0xFF) | (imm8 << 8); tag preserved |
| 0x02 | ADD vd, vs, vt | R3 | vd ← (INT, vs.value + vt.value); both operands must be INT |
| 0x03 | SUB vd, vs, vt | R3 | vd ← (INT, vs.value - vt.value); both operands must be INT |
| 0x04 | BZ vs, off8 | RI8 | if vs.value == 0 then pc ← pc + sign-ext(off8) |
| 0x05 | JMP off11 | J11 | pc ← pc + sign-ext(off11) |
| 0x06 | LD vd, vs, off5 | RR5 | vd ← (INT, mem[vs.value + off5]); vs must be PTR, CONT, or CLOSURE; caller must SETTAG if non-INT result is intended |
| 0x07 | ST vd, vs, off5 | RR5 | mem[vs.value + off5] ← vd.value; vs must be PTR; tag of vd discarded |
| 0x08 | ALLOC vd, layout4, n4 | (custom) | bump-allocate (n+1) words including header; vd ← (PTR, hp+1); hp += n+1; size field 15 is the escape value (consult next word for actual size) |
| 0x09 | MOV vd, vs | R3 | vd ← vs (copies both tag and value) |
| 0x0A | CALL vf, va | R3 | vf must be CLOSURE; push 3 words (return PC, v6.value, v6.tag) onto data stack; v6 ← vf; v0 ← va; jump to mem[vf.value] (code pointer at offset 0 of closure record) |
| 0x0B | RET va | RI8 | pop 3 words from data stack (saved v6.tag, saved v6.value, return PC); restore v6; v0 ← va; jump to return PC |
| 0x0C | MATCH vd, vs, tbl5 | RR5 | dispatch on vs (see §9); jump table follows at pc + tbl5 |
| 0x0D | INST family4, op4, hpc3 | E | install handler at pc + sign-ext(hpc3); assign monotonic install-ID; capture installer’s v6 (tag and value) into the handler frame |
| 0x0E | UNIN family4, op4 | E | uninstall most-recently-installed handler for (family, op); if CAM-resident, remove and refill from spill if available; if only in spill, scan and remove |
| 0x0F | PERF family4, op4, vr | E | perform effect with argument in vr; dispatch and continuation capture per §10 |
| 0x10 | RESU vk, vv | R3 | vk must be CONT-tagged with “unused” variant byte; restore saved register state and SP from the CONT_REC; v0 ← vv; mark CONT_REC used; return its slot to the cont free pool; jump to the saved PC |
| 0x11 | SETV vd, variant8 | RI8 | set variant byte of heap header at mem[vd.value - 1]; vd must be PTR |
| 0x12 | SETTAG vd, tag3 | (custom) | assert tag on vd; debug build: trap on mismatch; release build: rewrite tag without check |
| 0x13 | HALT | J11 | stop the core until a remote PERF arrives (idle state) |
| 0x14 | AND vd, vs, vt | R3 | vd ← (INT, vs.value & vt.value); both operands must be INT |
| 0x15 | OR vd, vs, vt | R3 | vd ← (INT, vs.value | vt.value); both operands must be INT |
| 0x16 | XOR vd, vs, vt | R3 | vd ← (INT, vs.value ^ vt.value); both operands must be INT |
| 0x17 | SHL vd, vs, vt | R3 | vd ← (INT, vs.value << (vt.value & 0xF)); both operands must be INT; shift count masked to 4 bits |
| 0x18 | SHR vd, vs, vt | R3 | vd ← (INT, vs.value >> (vt.value & 0xF)) (logical); both operands must be INT; shift count masked to 4 bits |
| 0x19 | INC vd, vs, imm4 | (custom) | vd ← (PTR, vs.value + imm4); vs must be PTR; result is PTR |
| 0x1A | DEC vd, vs, imm4 | (custom) | vd ← (PTR, vs.value - imm4); vs must be PTR; result is PTR |
| 0x1B | LDF vd, imm8 | RI8 | vd ← (INT, mem[v6.value + imm8]); v6 must be PTR; caller SETTAGs if non-INT result intended. Wide-immediate frame-local load |
| 0x1C | STF vd, imm8 | RI8 | mem[v6.value + imm8] ← vd.value; v6 must be PTR; tag of vd discarded. Wide-immediate frame-local store |
| 0x1D | ALLOCF imm11 | J11 | v6 ← (PTR, bump-allocate(layout=RECORD, n=imm11)); result lands in v6 implicitly. Wide-immediate heap allocation for frames exceeding ALLOC’s 4-bit n |
| 0x1E | ADJSP imm11 | J11 | sp ← sp + sign-ext(imm11); v6 ← (PTR, sp). Signed 11-bit displacement in words; used for stack-frame prologue/epilogue. RET is unchanged, so the epilogue must restore sp to its pre-prologue value before RET |
| 0x1F | TCALL vf, va | R3 | vf must be CLOSURE; v6 ← (PTR, mem[vf.value + 1]); v0 ← va; pc ← mem[vf.value]. No push of return record — the callee’s eventual RET pops the caller’s caller’s record, splicing the current frame out of the call chain. Used for indirect tail-call elimination |
9. MATCH semantics
MATCH vd, vs, tbl5:
- If
vs.tagisPTR: load the header word frommem[vs.value - 1], extract the variant byte (bits [15:8]), and use it as the index into the jump table atpc + tbl5. - If
vs.tagis a primitive (INT, BOOL, UNIT, etc.): use the tag value itself as the index.
vd is reserved for future use (a “scratch” output register for the loaded header word, if useful).
The jump table is a sequence of 16-bit code-relative offsets immediately following the MATCH instruction, at pc + tbl5. The CPU adds the selected offset to pc (post-MATCH) and jumps.
Tables longer than the variant range are permitted; entries beyond the actual variant count are unreachable. Missing entries trap.
10. PERF semantics
PERF family, op, vr:
- CAM search. Look for an entry matching
(family, op). If multiple, pick the one with the highest install-ID. - Spill search if CAM misses. Walk the spill region for matches; among CAM and spill matches, pick highest install-ID.
- Effect table fallback. If no installed handler is found, consult
effect_table[(family, op)]:Local(pc): behave as if a CAM entry pointed at this PC.Remote(dst, dev_op): build a NoC packet and emit; do NOT capture a continuation; continue at next instruction.Unhandled: trap.
On local dispatch (CAM hit, spill hit, or effect table Local):
- Allocate a
CONT_RECon the heap; capture saved registers (v0-v5,v6,v7— values and tags),PC(of next instruction),SP, and the promptSP(theSPat the handler frame).. - Set
v0 ← vr(effect argument). - Set
v1 ← (CONT, ptr_to_CONT_REC)(continuation handle). - Set
v7 ← (INT, op)(the 4-bit op field of the effect, sign- or zero-extended). - Restore
v6from the matched handler frame - Jump to the handler’s PC.
On remote dispatch:
- Build NoC packet with
dst = device_addr,src = this_core,op = dev_op,payload = vr.value(zero-extended). - Hand to NoC interface; stall if FIFO full.
- Continue at the next instruction.
Clobber notice: v0, v1, and v7 are unconditionally clobbered by every PERF that does local dispatch. The compiler treats them as not-live across perform sites.
CONT_REC Reclamation:
PERF and RESU form a place/take discipline.
PERFallocates a freshCONT_RECat the next available cont slot, or reuses a previously-freed slot from a free list.RESUconsumes the cont (one-shot, runtime-checked) and reclaims the slot. The reclamation path is hardware-internal — no visible bus traffic to the kernel.
Allocation policy in the current VM: free-list-managed. The committed spec permits three implementation variants for future hardware, all of which produce identical observable behaviour for kernel code:
- (a) Fixed-size on-chip ring buffer with a hard cap on simultaneously-live continuations.
- (b) Cont-CAM with spill to a software-managed pool. Mirrors the handler-CAM / spill structure from §4.
- (c) Pure software-managed cont area.
Kernel code is portable across all three; it only ever emits
PERF/RESU and never names cont slot addresses or reclamation
operations.
11. RESU semantics
RESU vk, vv:
vkmust beCONT-tagged.vk’s header variant byte must indicate “unused”; else trap.- Mark the CONT_REC as used (atomic with respect to preemption boundaries).
- Restore registers v0..v5, v7, and v6 from the CONT_REC’s saved slots.
- Restore SP to saved SP.
- Set
v0 ← vv(the value being delivered to the perform site). - Jump to the saved PC.
The handler frame issuing the RESU is destroyed; control flow transfers entirely to the resumed continuation.
CONT_REC Reclamation:
RESU consumes a cont and reclaims the slot. See full description in previous section.
12. Frame Semantics
Load Frame
LDF vd, imm8: vd ← (INT, mem[v6.value + imm8])
Form RI8. The caller is responsible for tagging the result with
SETTAG if the loaded value should not be INT-tagged.
Why this exists: the narrow LD vd, vs, off5 has a 5-bit offset,
limiting frame slot access to slots 0..31. Kernel functions need
larger frames. Rather than rework the general encoding, LDF adds an
8-bit immediate fixed to v6 (the frame-pointer / closure register),
giving 256-slot capacity for frame-local access.
Store Frame
STF vd, imm8: mem[v6.value + imm8] ← vd.value
Form RI8. Tag of vd is discarded (matches existing ST).
Allocate Frame
ALLOCF imm11: v6 ← (PTR, alloc(layout=0, n=imm11))
Form J11 (11-bit operand). Allocates from the heap; tags result PTR;
result lands in v6 implicitly. The narrow ALLOC has a 4-bit n
field (max 15 words including header); ALLOCF raises the cap to
2048 words. layout is hardcoded to 0 (RECORD); use narrow ALLOC
when a different layout is needed.
The codegen’s emitter automatically rewrites narrow → wide when the constants exceed narrow range.
Adjust Stack Pointer
ADJSP imm11 (signed): @sp ← @sp + imm; v6 ← (PTR, @sp)
Form J11. Signed 11-bit immediate (−1024..+1023 words). The data
stack pointer is adjusted; v6 is set to point at the new top.
Crucial discipline: RET is unchanged. It still pops 3 words
(return PC + v6.value + v6.tag) from the data stack. A stack-framed
function’s epilogue must restore @sp to its pre-prologue value
via a matching positive ADJSP before RET runs, or the return
record is read from the wrong location.
Implements: the hybrid frame-allocation policy promised by the
book’s design but unspecified at the ISA level. Functions whose
frames don’t need to outlive the call (i.e., non-performing) use ADJSP instead of
ALLOC/ALLOCF.
13. Calling conventions
Function call
CALL vf, va:
vfmust be CLOSURE-tagged; points to a closure record(code_ptr, env_word_0, env_word_1, ...)- Hardware pushes return PC (1 word), v6.value (1 word), v6.tag (1 word, packed in low 3 bits) to data stack — 3 words per call
- v6 ← vf
- v0 ← va
- Jump to code_ptr (word 1 of closure record)
Tail Call
TCALL vf, va: v6 ← (PTR, mem[vf+1])
v0 ← va
pc ← mem[vf]
-- NO push of return PC + v6 saved
Form R3, same field layout as CALL (0x0A). Semantically: CALL
without the return record push. The callee’s eventual RET pops
the caller’s caller’s record, splicing the caller out of the
call chain.
Required for indirect tail-call elimination through closure-typed
values; the kernel’s invoke wrapper compiles to this when in tail
position.
Function return
RET va:
- Pop 3 words from data stack: saved v6 tag, saved v6 value, return PC
- Restore v6
- v0 ← va
- Jump to return PC
Register classes
| Register | Class | Notes |
|---|---|---|
| v0 | arg / return / PERF arg | clobbered by every CALL and every PERF |
| v1 | scratch / PERF continuation | clobbered by every PERF |
| v2..v5 | caller-saved scratch | callee may freely clobber |
| v6 | $clo (closure environment) | preserved across CALL via implicit stack save |
| v7 | PERF op-tag | clobbered by every PERF |
No callee-saved registers. Function prologues do not preserve registers; the cost of preservation moves to callers and is paid only where needed.
Handler entry
When PERF dispatches locally:
- v0 ← effect argument
- v1 ← continuation handle (CONT tag)
- v7 ← op tag (INT)
- v2..v5 ← undefined; handler must not assume any value
- v6 ← undefined; handler’s closure (if any) is recorded in the CONT_REC and restored only on RESU
14. NoC packet format
(Recorded here for completeness; details may evolve as NoC implementation matures.)
[63:58] dst_id (6 bits, 64 devices)
[57:52] src_id (6 bits)
[51:48] op (4 bits)
[47:44] flags (4 bits, priority/control)
[43:12] payload (32 bits)
[11:0] reserved
PERF with remote dispatch builds packets with op = dev_op (from the effect table entry) and payload = vr.value (zero-extended from 16 bits). For larger payloads, the compiler emits a pointer-passing version (allocate, store, perform with pointer).
15. Boot semantics
(Unchanged from v0.2.1.) The bootstrap core comes up at a reset vector with empty CAM and empty effect table. Boot ROM populates the effect table, installs initial handlers, enumerates devices via PERF probes, and releases other cores.
16. Preemption semantics
(Unchanged from v0.2.1.) Preemption is sampled at the end of each sequencer cycle (Fetch→Decode→Execute→Write→Preempt). Instructions either retire fully or do not retire. Pending remote PERFs queue at the core’s NoC input; the queue is drained at preemption sample points.
HALT halts the core until any remote PERF arrives (idle state).
17. What’s deferred to v0.5
The following items from the compiler team’s v0.2.1 notes are deferred:
- Memory operation timing model. LD/ST/ALLOC currently 1 cycle, 0 abstract time. A coordinated timing pass after the first real workloads.
- Fused tag-aware loads (
LD.PTR,LD.CONT,LD.CLOSURE). Defer until profiling shows the two-instruction pattern in a hot path. - Multi-shot continuations. First-shot are one-shot only.
- Arithmetic shift right (
SAR). Add when a workload demonstrates the need. - Shift-by-immediate (
SHLI,SHRI). Peephole optimization in assembler should fuseLI+SHRpatterns; promote to ISA only if peephole proves insufficient. - Cache hierarchy modeling. Flat memory cost for now.
- Atomic operations (
CAS). Needed for multicore synchronization; not yet needed in single-core kernel. Add as one of the reserved slots when multicore begins. - Discard-without-resume. A handler clause that captures
kbut never resumes (e.g.ProcLifecycle.exit (), _k -> …) leaks the cont’s slot under the current VM’s reclamation discipline. This must be addressed.
18. Open questions
- Custom encoding details for ALLOC, SETTAG, INC, DEC need ratification from the emulator team. The fields are specified; the exact bit positions within the 11-bit operand region need to be agreed.
- Behavior on tag-check trap. Currently unspecified what happens when ADD encounters a non-INT, or RESU encounters non-CONT, or similar. Proposal: tag-check faults raise a hardware-originated PERF on family 0xF op 0 with a fault-code payload. Kernel installs a handler that decides whether to terminate the offending process or attempt recovery.
- CONT_REC variant byte format. Currently spec’d as a used/unused flag, but 8 bits is more than needed for that. Possible future use: epoch counter for ABA-style protection on continuation reuse in long-running systems.
19. Versioning
This document is ISA spec v0.4.