Preemption without a separate context

The previous chapter’s kernel is cooperative: processes yield voluntarily. If proc_a never performed Yield, the kernel would run it forever, ignoring everyone else. This is the cooperative-only failure mode that early Mac OS, Windows 3.1, and certain game engines have lived with — any process that doesn’t yield politely can hang the whole system.

Preemption is the fix. The kernel takes the CPU back from a running process whether the process wants it or not. In a conventional kernel, this involves a wad of new machinery: an interrupt context, interrupt enable/disable around critical sections, a separate kernel stack, special discipline about what can run in interrupt context. Here, none of that exists. Watch.

Adding preemption

We add three things:

A simulated timer device that performs Tick at intervals.
CPU-bound processes that never voluntarily yield.
…nothing else.

Specifically: the scheduler from the previous chapter requires no changes. The Tick clause was already there, anticipating the moment a timer would fire. Now that a timer exists and fires, the clause does its job.

(* === Effects (unchanged) === *)

effect Yield : unit
effect Print : string -> unit
effect Tick  : core_id -> unit

(* === Timer device abstraction (NEW) === *)

type timer = {
  mutable countdown : int;
  reload            : int;
}

let timer : timer = { countdown = 3; reload = 3 }

(* === Process and scheduler state (unchanged) === *)

(* ... process, sched_state, spawn, scheduler all unchanged ... *)

(* === CPU-bound processes that never yield (NEW) === *)

let proc_loop name iters () =
  let rec go n =
    if n >= iters then ()
    else begin
      perform (Print (name ^ string_of_int n ^ "\n"));
      go (n + 1)
    end
  in
  go 0

(* === Boot (slightly modified) === *)

let main () =
  spawn (proc_loop "A" 5);
  spawn (proc_loop "B" 5);
  scheduler ()

With preemption wired in, the output looks something like:

A0
A1
A2
B0
B1
B2
A3
A4
B3
B4

The exact interleaving depends on where in each process’s execution the timer fires — but the shape is what matters: each process runs for a quantum (here, three operations), then gets preempted, then resumes later. Neither process ever wrote perform Yield. They were preempted involuntarily.

What didn’t change

The scheduler. Not a line. The Tick clause was already there. The kernel was already preemption-capable; we just hadn’t connected a preemption source.

This is the unification paying off in real currency. In a traditional kernel, adding preemption means writing interrupt handlers, modifying the scheduler to be reentrant from interrupt context, managing interrupt enable/disable around critical sections, and dealing with the fact that interrupt context and process context have different stacks and different rules. Here, none of that exists, because there is no separate interrupt context. Tick is an effect; effects go to handlers; the scheduler is a handler. Same machinery.

The processes. proc_loop is straight-line code with no awareness that it might be preempted. It doesn’t call yield. It doesn’t check a preemption flag. It doesn’t cooperate. It just runs, and the kernel takes the CPU back when it wants to. This is the property real operating systems need: user code should not have to be aware of scheduling, and here it isn’t.

What did change

The timer. A new device with a countdown register and a reload value. The semantics: the ISA decrements countdown on every instruction (or every cycle, or every N cycles), and when it hits zero, the next instruction boundary is treated as a perform (Tick core_id) rather than executing the program’s next instruction. The countdown reloads from reload, and the cycle continues.

This is the simplest possible timer. A real one would let you program the reload, mask the interrupt, query the current value. The point here is the interface: from the language’s perspective, the timer is a thing that occasionally performs an effect.

The processes are CPU-bound. No perform Yield anywhere in proc_loop. The processes just compute. Without preemption, the first process to run would run to completion before the second got a chance.

What’s happening at the ISA level

This is where it gets interesting, because the mechanism for preemption involves the hardware doing something the previous chapter’s kernel didn’t require.

When proc_loop "A" 5 is running and the timer expires, here’s the sequence:

The processor is executing some instruction inside proc_loop — say, the recursive call to go. The countdown hits zero.
At the next instruction boundary, instead of executing the next instruction of proc_loop, the ISA inserts a perform. It’s as if the running code had written perform (Tick 0) at this point, except the code didn’t write that; the hardware did.
The hardware captures the continuation at this instruction boundary. This is the critical operation. The continuation represents “the rest of proc_loop from this exact point” — including the recursive call that was about to happen, the local variable n, everything.
The hardware searches up the handler chain for a frame handling Tick. It finds the scheduler’s handle ... with frame. It transfers control there, binding the continuation to k in the Tick _ k -> ... clause.
The handler files k back in the ready queue and calls scheduler (), which pops the next process.

Steps 2 and 3 are the new ISA capability required for preemption. The hardware has to be able to capture a continuation at an arbitrary instruction boundary, not just at a perform written in the source code.

Design space for preemption points

Three options:

Precise, every boundary. The hardware can capture a continuation at any instruction boundary, full stop. Every instruction is potentially preemptable. Most flexible and most expensive — every instruction boundary must be in a state from which the program can resume coherently.

Safe points only. The hardware can capture only at safe points — instruction boundaries the compiler has marked as preemptable. Between safe points, the timer is checked but the preempt is deferred. The GC-style approach.

Atomic regions. Preemption is precise (any instruction boundary), but the type system can mark code regions as atomic where preemption is forbidden. Lifted into the type system.

Our design (committed in the ISA spec) is option 1: precise preemption at every instruction boundary, with the preempt sampled at the end of each sequencer cycle (Fetch→Decode→Execute→Write→Preempt). Every instruction either retires fully or doesn’t retire at all. The cost is that every instruction has to commit to a clean state before retiring, which modern out-of-order processors already do for precise exceptions — so the additional cost is less than it sounds.

Where the captured continuation points

When proc_loop "A" is preempted in the middle of a recursive call, the captured continuation represents the state inside go, with n bound to whatever value it had, with the call to go partway through. Invoking this continuation later resumes the recursion from that point.

This is different from a thread context switch in two important ways:

There’s no separate stack to save. The continuation is the tree of frames; the frames already live wherever frames live. Capturing the continuation is capturing a pointer to the root. No memcpy.

There’s no “kernel stack” vs. “user stack” distinction. The handler clause for Tick runs in the same frame-tree environment as the process that was preempted, just with control now in the handler’s frame. The handler’s frames are siblings to the process’s frames in the tree. When the handler eventually invokes some continuation, control transfers to that subtree; the handler’s frames may persist if the handler hasn’t returned yet.

This second point does a lot of work. On a conventional machine, kernel code runs on a different stack from user code — partly for protection, partly because the kernel needs to handle interrupts that arrive while user code is mid-instruction. On our machine, the handler chain itself is the protection boundary. The user code is “inside” the scheduler handler — under it, in the frame tree. When the timer perform fires, control transfers from the user code’s frame to the handler’s frame upward in the same tree. There’s no separate stack because there’s no separate execution context; it’s all one tree of frames with a clear hierarchical relationship.

What enforces protection? The type system. The handler frame holds references to sched.ready and other kernel state; the user code’s frame does not, because it was never given those references. The user code physically cannot access kernel state — not because it’s on a different stack, but because no name in scope refers to kernel state.

This is exactly the capability model. Protection comes from what’s reachable through the lexical environment, not from page-table permissions. In our ISA, hardware page protection may be unnecessary or merely redundant — the type system has already prevented the access.

Preemption during a handler

What about reentrancy? If the scheduler handler is itself running — say, in the middle of Queue.push — and the timer fires, what happens?

This is solvable in two ways:

Hardware mask. The PERF instruction in handler context defers any pending Tick until the handler returns. Like interrupt masking on conventional hardware.

Tower discipline. The kernel installs a handler tower such that, during scheduler critical sections, there is no in-scope Tick handler. The handler search finds the bootloader’s no-op handler instead. The timer fires but is silently dropped (or queued for later via the no-op handler’s side effect).

We choose the tower discipline. The boot frame installs a Tick handler that does nothing — just resumes the continuation. The scheduler frame, deep inside its critical sections, doesn’t install its own Tick handler until it’s about to invoke a process; once it does, the user process’s preemption is enabled. When the user process is preempted and the handler clause runs, it uninstalls the Tick handler (implicitly, by returning into the scheduler’s outer frame) and re-installs it just before invoking the next process.

This means timer ticks arriving during scheduler critical sections are effectively dropped. For a timer that fires every few thousand instructions, dropping the rare tick that lands during a scheduler critical section is fine — the next tick will arrive shortly, and the scheduler isn’t slow.

This is preemption masking implemented entirely in the language’s normal handler discipline, with no special interrupt-mask register, no cli/sti, no separate hardware mechanism. The mask is which handlers are currently in scope. The kernel structure expresses it directly.

Edges to acknowledge

The model has two places where the abstraction strains, worth flagging:

Atomicity of preemption. The hardware has to either commit fully before retiring an instruction (no partial state visible at any boundary) or expose the partial state in a way the continuation can capture. The former is the discipline we want; it constrains what instructions can exist (no implicit multi-step operations) but keeps the model clean. Multi-cycle operations (NoC sends, complex effects) are allowed, but they commit at the end of the cycle, not partway through.

Pending performs. If a timer perform arrives while a previous perform is mid-execution, it queues up rather than firing immediately. The queue has bounded depth; if it fills, the hardware applies backpressure to the timer (which can simply drop ticks, since the next will arrive in a fixed interval). For a normal teaching kernel, this never bites.

What we’ve built so far

The kernel is now about 60 lines, and it has:

Multiple processes, identified by id, represented as continuations
Cooperative yielding via perform Yield
Preemptive scheduling via perform Tick from a timer device
Output via perform Print (a stand-in for a proper UART driver)
A scheduler that’s a single match expression of three handler clauses

The same kernel, with no scheduler modification, would handle a process that mixes voluntary yields and involuntary preemptions. The same kernel would handle ten processes or a thousand. The same kernel, with one new effect clause, would handle process exit, blocking on I/O, sleeping for a duration, sending IPC to another process — any of these is another effect signature and another handler clause, slotting into the existing structure.

What this chapter committed to

Preemption added without modifying the scheduler. The unification of voluntary yield and involuntary preemption under a single handler clause. The precise-preemption-at-instruction-boundary commitment. The tower-discipline approach to interrupt masking. The hint that protection is enforced by lexical scoping plus type system, not by separate execution contexts.

The next chapter adds blocking on I/O.

Keyboard shortcuts

Concrete Machines