Continuations as values, effects as control transfer

This is the chapter that explains the design’s most unusual commitment, the one that runs through everything else. Most CPUs treat control flow as: execute the next instruction, or take a branch, or call a function and eventually return, or take an interrupt and eventually return. This ISA adds one more primitive: perform an effect. The hardware captures the current point of execution as a value, transfers control to a handler that the kernel installed, and lets the handler decide whether to resume.

This sounds esoteric. It is, in fact, the most useful primitive we will add. Let me build up to why.

What’s a continuation, exactly

A continuation is the rest of the computation, reified as a value. If you have an expression

let x = E in F

then “the rest of the computation after E completes” is conceptually the function fun v -> let x = v in F. In a language with first-class continuations, this rest-of-the-program can be captured as a value, stored, passed around, and invoked later.

If you’ve used call/cc in Scheme, you’ve worked with continuations. If you’ve written async/await in any modern language, you’ve worked with a restricted form of continuations — await saves “the rest of this function” as a state machine and resumes it when the awaited value arrives. If you’ve written generator functions (yield in Python), the generator’s frame is a continuation that gets resumed each time you ask for the next value.

In CML, continuations are first-class values. They have a type — unit continuation is a continuation that expects unit when resumed; int continuation is one that expects an int. They live in registers; they live on the heap; they get passed to functions; they get stored in queues. The compiler tracks their types.

In the ISA, continuations have a hardware tag — CONT — and the runtime representation is a heap-allocated record describing exactly enough state to resume from where they were captured. We pinned this layout in the ISA specification (Appendix A): saved program counter, saved stack pointer, saved registers, and a “used / unused” flag.

What’s an effect

An effect is a typed operation that user code can perform and that some surrounding handler is responsible for interpreting. In CML, effects are declared by signature:

effect IO {
  print : string -> ()
  read_byte : () -> byte
}

effect Scheduler {
  yield : () -> ()
  exit  : () -> empty
}

These declarations introduce families of operations. Each operation has a type. The print operation takes a string and returns unit; the read_byte operation takes nothing and returns a byte.

User code performs an operation:

let n = perform IO.read_byte () in
perform IO.print "you typed something";
n

When this code runs, perform IO.read_byte () causes control to transfer to whatever handler is currently installed for the IO.read_byte effect. The handler decides what to do — read a byte from somewhere, return it. When the handler resumes the continuation, the value of the perform expression becomes whatever the handler supplied.

This is similar to a function call. The difference is who decides what the call does. With a function call, the called function is named at the call site; you know what runs. With a perform, the call site names only the effect; what runs is whichever handler is currently in scope. The same perform IO.print msg might write to UART in the kernel, write to a buffer in a test harness, or write to a network socket in a distributed system. The user code doesn’t know.

What’s a handler

A handler is an installed responder for one or more effects. In CML:

handler print_to_uart = {
  | IO.print s, k       -> uart_send s; resume k ()
  | IO.read_byte (), k  -> resume k (uart_recv ())
  | return v            -> v
}

let main () =
  with print_to_uart in {
    perform IO.print "hello\n";
    let b = perform IO.read_byte () in
    perform IO.print ("got: " ^ string_of_byte b ^ "\n")
  }

The handler print_to_uart declaration defines a set of clauses. Each clause names an effect operation and binds two things: the operation’s argument, and the continuation. The continuation k represents “the body’s execution at the point of this perform, ready to be resumed.”

When the body performs IO.print "hello\n", control transfers to the IO.print s, k clause with s bound to "hello\n" and k bound to the suspended continuation. The clause does the actual work (writing to UART) and calls resume k () to deliver () back to the perform site, after which the body continues with the next line.

The return v clause runs when the body completes normally. It receives the body’s return value and is the handler’s last opportunity to do something with it.

Why this is a unification

Now the payoff. Several operations that look distinct in conventional kernels are, under this design, the same operation:

A context switch. A process performs Scheduler.yield. The handler captures the continuation, files it in the ready queue, pops a different continuation, and invokes it. Context-switching is perform + resume on a different continuation.

A blocking I/O call. A process performs IO.read_byte. The handler checks if a byte is available; if so, it resumes the continuation with the byte. If not, it files the continuation in a “waiting for input” queue and runs someone else. When a byte arrives later, the handler picks up the waiting continuation and resumes it.

A timer interrupt. The timer device performs Hardware.tick. The handler captures the interrupted code’s continuation, files it in the ready queue, and runs someone else. This is the same operation as voluntary yield. The only difference is who performed it — the running code (yield) or the timer device (tick).

A device-completion notification. The disk performs Hardware.disk_done. The handler looks up which continuation was waiting for this completion, supplies the result, and resumes.

An exception or error. A process performs Error.divide_by_zero. The handler decides whether to recover (resume with a default value), terminate the process (don’t resume), or escalate (perform another effect with the error).

All of these are the same mechanism. One control-flow primitive — perform — that unifies what conventional systems treat as half a dozen distinct mechanisms with their own machinery.

Why this is hardware-shaped

In a software implementation of effects — as found in OCaml 5, Koka, Eff — the mechanism for perform is some variant of stack manipulation: the runtime walks back up the stack until it finds an installed handler, then transfers control. This costs a few hundred to a few thousand cycles, depending on stack depth.

In our ISA, perform is a single instruction. The handler search happens in hardware. The continuation capture happens in hardware. The transfer happens in hardware. The whole operation retires in a handful of cycles. Context switches between processes — which require user-level continuations to be captured and invoked — are no longer expensive operations that kernels have to engineer around. They’re cheap, and they’re uniform with every other effect.

This is the design’s central bet: that effects are common enough, and important enough, that putting them in hardware pays off.

The handler search mechanism

The hardware needs to find the handler for a given effect. The data structure is a handler stack, with handlers installed by recent with blocks at the top and the kernel’s bottom handlers at the bottom. When user code performs an effect, the hardware searches top-down for a handler that catches it.

In our ISA, this stack is partially cached in a small content-addressable memory called the handler CAM. The CAM holds the most-recently-installed handlers. If the effect is in the CAM, dispatch is one cycle. If not, the hardware walks the spilled portion of the handler stack in memory, costing more cycles per probe. The CAM is 8 entries in the current design; the spill cost is roughly 12× a hit. This is documented in the ISA spec.

For the user code, none of this is visible. perform either dispatches fast or slow; correctness is the same; the cost model is published so programs can be tuned.

Continuations are linear

A subtle but load-bearing decision: continuations in CML are linear. They can be resumed exactly once. If a handler clause resumes the continuation, it cannot also resume it again. If a handler clause stores the continuation in a queue, it cannot also resume it directly.

This is what makes effect-based scheduling sound. If continuations could be duplicated, a buggy or malicious handler could invoke a process’s continuation twice, running the process simultaneously in two contexts with all the resulting confusion. By making continuations linear, the type system rules this out.

Linearity also enables migration across cores. When you ship a continuation to another core (as we’ll see in the multicore chapter), the source core loses it and the target core gains it. There’s no danger of both cores trying to run the process; the type system has enforced that there’s only one continuation handle, and now it lives on the other core.

We’ll fully unpack linear types in chapter 13. For now, the important fact: continuations are values that can be passed around, but they’re linear, so the type system tracks ownership.

What this chapter committed to

Continuations as values the ISA understands. Effects as typed operations that transfer control to installed handlers. The unification of context switches, I/O blocking, interrupts, exceptions, and a few other concepts under one mechanism. The hint that linearity makes the whole thing safe.

The next chapter traces a single physical event — a keystroke — through every layer of this design, from the switch under the key to the perform that delivers it to a process.

Keyboard shortcuts

Concrete Machines