Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

A Small ML for Effects-Native Hardware — Language Spec v0.3

Status

This is v0.3 of the language spec. Resolutions to several v0.1 and v0.2 open questions are now pinned. New decisions surfaced by the compiler team’s implementation work are incorporated.

This document remains normative for the compiler implementation and informative for the kernel and library implementations.

The language is unnamed for now. Working name: CML (Concrete ML).

What changed from v0.1

  • OPEN 3 resolved: Frame allocation is hybrid (stack for normal frames, heap for PERF-captured continuations). Was previously specified as all-heap.
  • Effect-shadowing semantics pinned: Nested handlers shadow by install order, not by cache location. Has implications for both compiler lowering and the ISA’s handler-stack semantics.
  • Pointer arithmetic via typed instructions: Specifies that the language relies on tag-preserving pointer operations (INC style) rather than untagged raw arithmetic. The ISA team is asked to provide these.
  • Bit operations required from ISA: AND, OR, XOR, SHL, SHR needed for variant extraction and other bitfield work. Was previously OPEN 4.
  • Continuation in-memory layout pinned: On-heap shape specified, no longer implementation-defined.
  • Handler-tower locality is a first-class concern: The CAM-vs-spill cost (12×) is significant enough that the language and compiler should help programmers manage it.

1. Design principles (unchanged from v0.1)

One: Effects are the only mechanism for I/O, allocation, scheduling, and communication. There are no built-in side effects in the language proper. Every operation that touches the world goes through perform.

Two: The type system tracks effects via row polymorphism. Every function’s type includes its effect row. Effects propagate by inference; annotations are optional but checked.

Three: unsafe is itself an effect. Operations that emit raw ISA instructions are gated by unsafe { ... } blocks, which add Unsafe to the surrounding function’s effect row.

Four: Memory allocation is an effect. There is no built-in allocator. The bottom handler decides the allocation strategy.

Five: The language and the ISA are co-designed. Language constructs lower directly to ISA primitives without runtime support beyond the kernel and its handlers.

2. Syntax overview

(Unchanged from v0.1. Reproduced here for self-containment of this document. Skip to §3 if familiar.)

2.1 Types

type list a =
  | Nil
  | Cons of a * list a

type option a =
  | None
  | Some of a

type pair a b = { fst : a; snd : b }

Algebraic data types with constructors (sum) and records (product). Type parameters are lowercase identifiers.

2.2 Functions

let id (x : a) : a = x

let rec sum_list (xs : list int) : int =
  match xs with
  | Nil          -> 0
  | Cons (h, t)  -> h + sum_list t

Functions are curried. let rec for recursion. Pattern matching on sums via match ... with.

2.3 Effects

effect State a {
  get : () -> a
  put : a -> ()
}

effect IO {
  print : string -> ()
  read_byte : () -> byte
}

Effect declarations introduce a family with named operations. Each operation has a signature; the return type is what the perform-site receives back when the handler resumes.

Effects can be parameterized over types.

:: and Cons

h :: t desugars to Cons (h, t) uniformly. It is a 2-payload constructor application; the typer enforces the list-shape because that’s how Cons is declared, but codegen treats it as a normal constructor call.

2.4 Lists

[]              ⇒ Nil
[a; b; c]       ⇒ Cons (a, Cons (b, Cons (c, Nil)))

These are sugars. Requires list type with Nil and Cons constructors in scope.

2.5 Records

Full record support: declarations, literals, field access, field assignment, shorthand { f } = { f = f }, patterns, mutable modifier. The sched_state type in the book uses all of these.

Book impact: Appendix B mentions records but doesn’t fully specify patterns, shorthand syntax, or assignment. Substantial expansion needed.

Implementation gap: mutable is parsed but not enforced — the compiler currently allows assignment to non-mutable fields. Spec should note this as an implementation deficiency.

2.6 Anonymous handler sugar (let | and)

and proc_handlers p = {           -- a handler, parameterised by p
  | ProcLifecycle.yield (), k -> ...
  | ...
}

A let/and binding whose RHS begins with { | clauses } is a parameterised handler declaration in disguise. Params become handler state parameters. Allows handlers to coexist in let rec … and … groups with ordinary functions.

2.7 Perform and handle

let inc () : int / {State int} =
  let n = perform State.get () in
  perform State.put (n + 1);
  n

let main () =
  with ref_state 0 in {
    let n = inc () in
    print_int n
  }

perform Effect.op (args) performs an operation. The type carries an effect row.

with handler args in { body } installs handler for the duration of body. The handler discharges the corresponding effect from the body’s effect row.

2.8 Handler declarations

Stateless handler:

handler print_to_uart = {
  | IO.print s, k    -> uart_send s; resume k ()
  | IO.read_byte (), k -> resume k (uart_recv ())
  | return v         -> v
}

Parameterized handler (state-carrying):

handler<a> ref_state(init : a) = {
  | State.get(), k    -> resume k init      with state := init
  | State.put(v), k   -> resume k ()        with state := v
  | return x          -> x
}

(Syntax for parameterized state remains an open compiler-team question; see §10.)

2.9 Unsafe blocks

let alloc_int_cell () : ptr / {Unsafe} =
  unsafe {
    let cell = __alloc Record 1 in
    __st 0 cell 0;
    cell
  }

unsafe { ... } adds Unsafe to the surrounding function’s effect row. Inside, primitives prefixed with __ are available; see §6.

3. Type system

3.1 Types

T ::= int | byte | bool | unit | string | empty
    | ptr | cont           -- low-level types, primarily from unsafe blocks
    | 'a                   -- type variable
    | T -> T / R           -- function type with effect row
    | (T, T, ...)          -- tuple
    | { l : T; ... }       -- record
    | type_name T1 T2 ...  -- type constructor application

int is 16-bit signed. byte is 8-bit unsigned. empty is the empty type, used for never-returning functions (exit : () -> empty / {Scheduler}).

3.2 Type aliasa

type core_id = int

Transparent. Type-parameterised aliases (type box 'a = 'a ref) are specified but not implemented yet.

3.3 Effect rows

R ::= {}                              -- pure
    | { E1, E2, ... }                 -- closed row
    | { E1, E2, ... | r }             -- open row with variable

Row polymorphism is supported. A function’s effect row is inferred; annotations are optional and checked.

Decision (v0.2): The language uses effect rows à la Koka, not effect handles à la Frank/OCaml 5. This was a major implementation choice flagged in the compiler team’s notes; ratifying it explicitly. Effect rows give the kernel the ability to statically prove “this function does not allocate,” which is load-bearing for the design.

3.4 Inference

Hindley-Milner with effect rows. Implementation follows the algorithm in [Leijen 2017]. The compiler team’s working notes mention an option to start with inference-only (no surface effect syntax) and add surface annotations later. Acceptable as an implementation phase; the semantics is row-based from day one.

3.4 The Unsafe effect

Unsafe is built-in. Any use of an __-prefixed primitive adds Unsafe to the surrounding function’s row. The compiler permits unsafe { ... } blocks anywhere; the type system tracks them via the row.

4. Memory model

4.1 No built-in allocation

Every allocation is perform Mem.alloc { ... }. The Mem effect is in the prelude:

effect Mem {
  alloc : { layout : layout; size : int } -> ptr
  free  : ptr -> ()
}

The bottom handler implements Mem. The kernel installs different bottom handlers for different memory strategies.

4.2 Construction syntax

let xs = Cons (1, Nil) desugars to a sequence of perform Mem.alloc calls and field writes, wrapped in an implicit unsafe. The user does not see the desugaring.

Optimization permitted: When the compiler can prove that the active Mem handler is a known bump-allocator (e.g., inside a with region r in { ... } block whose handler is statically resolvable), it may compile allocation directly to ALLOC instructions, bypassing the perform-and-handler dispatch. This is a semantics-preserving optimization, not a language change.

4.3 Region handlers

The canonical user-level memory strategy is region-based:

with region r in {
  let xs = Cons (1, Nil) in
  let n = sum_list xs in
  n
}

The region handler intercepts Mem.alloc and routes allocations into a bump arena. When the with block exits, the arena is reset.

OPEN (carried from v0.1): Region escape detection. Forbidding ptr return from with region blocks is the v0.1 answer and remains the v0.2 answer. A full region-polymorphism system is deferred.

4.4 GC

Not in v0.2. A future bottom handler can implement tracing GC; the language doesn’t change.

5. Compilation overview

Note on modules: Each .cml file is a module named after its basename, capitalised. Cross-module references use Module.name. Effects and types share a global namespace; functions are module-local.

5.1 Frame allocation: hybrid stack/heap (NEW in v0.2)

Resolved from v0.1 OPEN 3.

Function-call frames live on the data stack used by CALL/RET. Continuation frames captured by PERF are migrated to the heap as CONT_REC objects at capture time.

This is the production design from the compiler team’s implementation work and proves to be tractable. The simplification “all frames on the heap” from v0.1 is withdrawn.

Implications:

  • Normal function calls do not allocate. CALL pushes return PC + saved closure to the stack; RET pops them. No Mem.alloc perform involved.
  • PERF is the allocation point. When PERF captures a continuation, the live register state and the relevant stack slice are migrated to a fresh CONT_REC on the heap. The on-heap layout is specified in §5.6.
  • Tail calls (§5.8) reuse the current frame; no stack growth.

5.2 Lowering of expressions

SourceTarget
42 (int literal)LI vd, 42 (INT tag default)
x (variable)depends on x’s storage class
f x (application)place x in v0, f in v6, emit CALL v6, v0
let x = e1 in e2compile e1, store result in x’s slot, compile e2
match v with ...compile v to a register, emit MATCH with dispatch table
perform E.op argsplace args in v0, emit PERF (family, op), v0
resume k vplace v in v0, k in v1, emit RESU v1, v0
unsafe { ... __op ... }direct emission of corresponding ISA instruction
let sched : sched_state = init_sched_state (this_core ())

This is the binding that tripped you up earlier. Allocates a memory cell at fixed address 0x0E00+, initialised once in main’s prologue. References LD from the slot.

5.3 Register classes (NEW in v0.2, ratified from compiler team)

RegisterClassNotes
v0argument / return / PERF argumentclobbered by every CALL and every PERF
v1scratch / PERF continuationclobbered by every PERF (receives continuation)
v2..v5caller-saved scratchcallee may freely clobber
v6$clo (closure environment)preserved across CALL via implicit save/restore
v7PERF op-tagclobbered by every PERF (receives op-tag)

No callee-saved registers. Function prologues are empty; the cost of preserving values moves to callers and is paid only where needed. For an effect-heavy language, this is the right tradeoff — call frequency exceeds the relative gain from callee-saved registers.

v0, v1, and v7 are clobbered by PERF. The compiler must treat them as not-live across any perform site. This constrains register allocation in effect-heavy code; programs should be written assuming three of eight registers are always at risk near perform expressions.

5.4 Function lowering

A function let f x = body compiles to:

  1. A heap-allocated closure record with layout = CLOSURE_REC, containing:

    • Word 0: header
    • Word 1: code pointer
    • Words 2..N+1: captured free variables
  2. A code body at the address in word 1. The body’s prologue:

    • Receives argument in v0, closure pointer in v6
    • Loads needed captured free variables from v6 + offset
    • Executes the body
    • Returns via RET v0

5.5 Sum type lowering

A sum type type t = | A of int | B of (int, int) lowers per v0.1 §5.4, with the addition that variant extraction from headers requires bit operations (§9).

Pattern match on v1 : t:

LD       v3, v1, -1     ; load header (header sits before payload)
LI       v_shift, 8
SHR      v3, v3, v_shift ; shift variant byte down to low bits
LI       v_mask, 0xFF
AND      v3, v3, v_mask  ; mask off layout/size fields
MATCH    v3, v3, table   ; dispatch on variant

The compiler emits this sequence at every match on a PTR-tagged value.

Optimization hook: When the compiler can prove a match has only one or two variants, it can replace MATCH with a single BZ against the variant. Worth doing; common in practice for option types.

5.6 Continuation in-memory layout (NEW in v0.2)

A CONT_REC on the heap has the following shape:

Offset  Field                  Notes
------  ---------------------  -----
  -1    header                 layout = CONT_REC, size = 11, variant = used/unused flag
   0    saved PC               16-bit code address
   1    saved SP               stack pointer at capture time
   2    prompt SP              stack pointer at handler frame (delimits the captured slice)
   3    saved v6 value
   4    saved v6 tag           (low 3 bits in a 16-bit word; remaining bits reserved)
   5    saved v0 packed        high byte = tag, low byte = value (or two words; see note)
   6    saved v1 packed
   7    saved v2 packed
   8    saved v3 packed
   9    saved v4 packed
  10    saved v5 packed
  11    saved v7 packed

Note on packing: Current 16-bit values + 3-bit tags fit awkwardly. The implementation may use either (a) one word per register with tag in the high 3 bits and value in the low 13 bits, sacrificing 3 bits of value range in saves, or (b) two words per register (tag word + value word), doubling the CONT_REC size. Option (b) is more future-proof for a 32-bit expansion. Decision: option (b) for v0.2. The continuation record is larger but the layout extends cleanly to wider data words. Total size grows to approximately 19 words; still well within the 15-word inline-header limit if we use the “size = 15, consult next word” escape.

The continuation handle returned by PERF is a CONT-tagged pointer to the header of this record.

RESU vk, vv reads the CONT_REC pointed to by vk, restores the register state and stack slice, places vv in v0 at the resume site, and transfers control to the saved PC. The variant flag in the header is updated to mark the continuation consumed; a second RESU traps.

This layout is specified, not implementation-defined. Future implementations (silicon, alternative emulators, debuggers) must agree. The Ruby emulator’s current “side-table index” approach is a permitted implementation strategy only if it maintains semantic equivalence with the spec’d layout — i.e., a debugger or migration tool would see the same logical contents.

5.7 Effect lowering and handler installation

(Mostly unchanged from v0.1.)

The compiler emits effect declarations into a manifest. Linker resolves to (family, op) pairs.

Reserved families:

  • Family 0: Privileged control plane.
  • Family 0xF: Hardware-originated effects.
  • User available: 1 through 0xE.

Handler installation lowers to:

INST  family, op, +1
JMP   past_handler
handler_body:
  ...
  RESU v1, vresult
past_handler:
  ...

Handler-tower locality (NEW in v0.2):

The handler CAM holds 8 entries. Beyond that, handlers spill to memory, with a 12× cost penalty on PERF searches that miss the CAM. The compiler should:

  • Warn when a with block’s compilation would cause the static handler tower depth to exceed 8.
  • Avoid emitting handler installations in inner loops where the same handler could be hoisted outside.
  • Document handler depth as a programmer-visible cost.

The language does not statically prevent deep handler towers — they remain expressible — but the compiler issues guidance, and runtime profiling can flag spill-thrashing.

Handler shadowing semantics (NEW in v0.2):

Multiple installations of the same (family, op) must resolve to the most-recently-installed match. This is the standard algebraic-effects semantics; it is non-negotiable.

This places a constraint on the ISA: the handler-stack lookup mechanism must respect install order, not merely cache recency. The compiler team’s v0.3 notes propose install-ID disambiguation; that’s the ISA’s problem to solve. The language requires the semantics; the ISA must deliver it.

5.8 Tail calls

Tail calls must be optimized. The compiler identifies tail-position calls and emits JMP (after argument setup) rather than CALL+RET. Tail calls do not grow the stack.

This is a hard requirement. The kernel scheduler is written assuming TCO; non-TCO compilers cannot compile it.

6. Pointer arithmetic (NEW in v0.2)

The language relies on typed pointer arithmetic for kernel-level code: bumping heap pointers, walking array elements, etc.

The ISA is asked to provide tag-preserving operations:

INC vd, vs, imm4    ; vd ← (PTR, vs.value + imm4); requires vs.tag = PTR
DEC vd, vs, imm4    ; vd ← (PTR, vs.value - imm4); requires vs.tag = PTR

Inside the language, these surface as primitive operations in unsafe blocks:

unsafe {
  let hp' = __inc hp 3 in     (* bump heap pointer by 3 words *)
  ...
}

The language does NOT expose untagged arithmetic. No __add_raw or ADD.RAW. Type-checked arithmetic over INT operands is the only ALU operation form. Pointer arithmetic uses typed INC/DEC.

Rationale: untagged arithmetic would punch a hole in the system’s type-tagging invariant — a value of unknown tag could enter the ALU, and downstream type-checked operations could not safely consume the result. Typed pointer ops preserve the invariant while solving the actual need (the kernel almost never needs general pointer arithmetic; it needs +N and -N for small N).

If the ISA team objects, the fallback is ADD.RAW/SUB.RAW inside unsafe blocks only, with the type system tracking Unsafe aggressively. But INC/DEC are preferred.

7. Bit operations (NEW in v0.2)

The language relies on the ISA providing:

AND vd, vs, vt       ; bitwise AND; both INT
OR  vd, vs, vt       ; bitwise OR; both INT
XOR vd, vs, vt       ; bitwise XOR; both INT
SHL vd, vs, vt       ; vs << vt.value; both INT
SHR vd, vs, vt       ; vs >> vt.value, logical (zero-fill); both INT

These are required for variant extraction (§5.5), bitfield manipulation in headers, and bitmap operations in the kernel.

These are non-negotiable; the language cannot compile without them. The ISA team has been notified.

Arithmetic shift right (SAR) is not yet required and is deferred.

8. Standard prelude

effect Mem {
  alloc : { layout : layout; size : int } -> ptr
  free  : ptr -> ()
}

effect IO {
  print : string -> ()
  read_byte : () -> byte
}

effect Scheduler {
  yield : () -> ()
  spawn : (() -> ()) -> ()
  exit  : () -> empty       (* never returns; uses empty type *)
}

effect Unsafe { /* raw ISA primitives */ }

The kernel installs bottom handlers for Mem, IO, Scheduler before user code runs.

9. Compiler-emitted optimizations

The compiler is permitted (and encouraged) to:

  • Inline handler dispatch when the handler is statically known. A perform Mem.alloc inside a with region r in { ... } block can compile directly to ALLOC without going through PERF/handler-search.
  • Elide SETTAG vd, INT after LD (since LD already produces INT).
  • Replace MATCH with BZ-chains for sum types of one or two variants.
  • Inline small functions to avoid the CALL overhead.
  • Optimize tail calls (required, not optional).

The compiler is forbidden from:

  • Reordering perform sites within a function. The order of perform calls is observable.
  • Eliminating perform sites, even if the handler is known to be effect-free. Future handlers may not be.
  • Caching the result of a perform across calls.

10. Open questions, consolidated

Carried from v0.1 or surfaced anew:

  • OPEN (compiler team to resolve): Parameterized handler state syntax. (with state := v vs. resume k v new_state vs. other.)
  • OPEN (v0.1 carryover): Region escape detection. Forbid ptr return from regions for v0.2; consider region-polymorphism for v0.3.
  • OPEN (ISA team to resolve): Multiple-install disambiguation. Language requires install-order semantics; ISA must deliver. Compiler team’s v0.3 notes propose install-ID disambiguation, which is acceptable.
  • OPEN (deferred): Fused tag-aware load variants (LD.PTR, LD.CONT, LD.CLOSURE). Profile first, decide later.
  • OPEN (deferred): Multi-shot continuations. v0.2 specifies one-shot.

11. Versioning

This document is Language Spec v0.2.

Major semantic changes will produce v0.3. Clarifications without semantic change will produce v0.2.1.