A teaching kernel in 40 lines

Part III is where the design earns its keep. We’ve laid out the algebra, committed to an ISA, specified an effect-aware language. Now we use all of it to build a kernel.

The kernel grows feature by feature across this chapter and the next four. By the end of Part III, it’s a multicore preemptive kernel that can run multiple processes, time-share between them, block on I/O, and migrate work between cores. It will be approximately 200 lines of CML. The features add cleanly without restructuring the existing code, which is itself the demonstration we’re trying to make: the model composes.

This chapter is the starting point. A cooperative single-core kernel. About 40 lines. It does the minimum needed to be a kernel: multiplex a CPU across multiple flows of control using only the primitives we’ve built.

The full kernel

(* === Effect signatures === *)

effect Yield  : unit
effect Print  : string -> unit
effect Tick   : core_id -> unit

(* === Process and scheduler state === *)

type process = {
  id : int;
  k  : unit continuation;
}

type sched_state = {
  mutable ready   : process queue;
  mutable next_id : int;
}

let sched : sched_state = {
  ready   = Queue.empty;
  next_id = 0;
}

(* === Spawning a process === *)

let spawn (body : unit -> unit) : unit =
  let pid = sched.next_id in
  sched.next_id <- pid + 1;
  let k = reify (fun () -> body ()) in
  Queue.push sched.ready { id = pid; k }

(* === The scheduler loop === *)

let rec scheduler () : unit =
  match Queue.pop sched.ready with
  | None -> ()
  | Some p ->
      handle
        invoke p.k ()
      with
      | Yield k ->
          Queue.push sched.ready { id = p.id; k };
          scheduler ()
      | Print msg k ->
          uart_write msg;
          invoke k ()
      | Tick _ k ->
          Queue.push sched.ready { id = p.id; k };
          scheduler ()

(* === Two toy processes === *)

let proc_a () =
  perform (Print "A1\n");
  perform Yield;
  perform (Print "A2\n");
  perform Yield;
  perform (Print "A3\n")

let proc_b () =
  perform (Print "B1\n");
  perform Yield;
  perform (Print "B2\n");
  perform Yield;
  perform (Print "B3\n")

(* === Boot === *)

let main () =
  spawn proc_a;
  spawn proc_b;
  scheduler ()

When this runs, the output is:

A1
B1
A2
B2
A3
B3

Two processes, alternating, cooperatively yielding. About 40 lines. That is a kernel. A small one, missing many things, but a real kernel in the sense that matters: it multiplexes a CPU across multiple flows of control using only effects and continuations. Let’s go through it.

The effect declarations

effect Yield : unit
effect Print : string -> unit
effect Tick  : core_id -> unit

An effect declaration introduces a typed operation that user code can perform and that some surrounding handler must interpret. The type after the colon is the type the perform-site receives back when the handler resumes it.

Declaring an effect does not say what it does. Yield has no implementation here. It’s an abstract operation. The implementation is whatever handler catches it. The same perform Yield could be interpreted as “put me on the back of the queue” by the cooperative scheduler, “yield to the highest-priority blocked process” by a priority scheduler, or “no-op” by a degenerate scheduler. The effect is the signature; the handler is the implementation.

Note Tick : core_id -> unit — the timer effect carries which core it’s targeting. In a multicore setting, the timer device performs (Tick 0), (Tick 1), etc., and the NoC routes the perform to the corresponding core. This is your inter-device-messaging-as-effects design showing up naturally. For now, single-core, the core_id is always 0.

The process type

type process = {
  id : int;
  k  : unit continuation;
}

A process is two things: an id and a unit continuation. Read unit continuation as “a continuation that, when invoked with a value of type unit, will run to completion (or until it performs another effect).” This is the type you wanted to see.

A continuation is the rest of the computation, reified as a value. A unit continuation means “this continuation doesn’t need any value to resume — it was suspended at a point that didn’t bind a result.” A perform Yield is suspended this way: the effect returns unit, so the continuation captured at the perform site is a unit continuation. When the scheduler invokes it with (), the suspended code resumes after the perform Yield.

A process, then, is literally just a continuation with metadata. There’s no task_struct. The continuation is the saved state.

The size of unit continuation as a value is one word — it’s just a pointer. The size as a resource is the size of the frame tree, which can be small (a process that has barely started) or large (a process deep in some recursion).

The scheduler state

type sched_state = {
  mutable ready   : process queue;
  mutable next_id : int;
}

A single mutable record holding the ready queue and a counter. mutable marks fields you’re allowed to reassign; ML languages default to immutable, and you have to opt in to mutation.

This is where ML’s purity-by-default policy bumps into systems reality. The scheduler is intrinsically stateful — the ready queue changes over time, and that change is observable by every scheduling decision. We can’t pretend otherwise. So we use mutable, and we accept that this code is not pure.

But notice what we’ve contained. The mutability is in one place, in a record with a well-defined invariant. Everything else in the kernel is pure: proc_a doesn’t mutate; the handler clauses receive continuations as values and dispatch them without mutating anything except sched.ready. The mutation is localized to the resource that genuinely needs to mutate.

This is the discipline. ML doesn’t ban mutation; it makes mutation visible and contained. When you read a function signature, the absence of mutation is a guarantee; the presence of mutation is a deliberate, marked choice.

Spawning

let spawn (body : unit -> unit) : unit =
  let pid = sched.next_id in
  sched.next_id <- pid + 1;
  let k = reify (fun () -> body ()) in
  Queue.push sched.ready { id = pid; k }

Three things happen: assign an id, reify body into a continuation, and push the new process onto the ready queue.

reify is the operation that converts a function-not-yet-run into a continuation-ready-to-be-invoked. Think of it as “boxing up the work but not starting it.” After reify, the continuation k is a value sitting in memory representing “the entire computation body (), paused at the very beginning.” When something later does invoke k (), the computation starts running.

The scheduler loop

let rec scheduler () : unit =
  match Queue.pop sched.ready with
  | None -> ()
  | Some p ->
      handle
        invoke p.k ()
      with
      | Yield k -> ...
      | Print msg k -> ...
      | Tick _ k -> ...

This is the heart. Line by line:

The scheduler pops a process. If none, exit. If one, invoke its continuation inside a handle block. The handle block installs handlers for the three effects we care about; when the running process performs any of them, control transfers to the matching clause.

The handle ... with block installs a handler for the listed effects, then runs the body. The body is invoke p.k () — we run the process. The process runs inside this handler. When it performs an effect, control transfers to the matching clause of the handler. The clause receives the effect’s payload and the continuation k representing the rest of the process at the point of the perform.

Read that last sentence twice. The handler clause binds k. This k is the suspended process. When the process did perform Yield, the hardware captured the state of the process at that point and bound it to the name k in this handler clause.

The Yield clause

| Yield k ->
    Queue.push sched.ready { id = p.id; k };
    scheduler ()

When the running process performs Yield, the hardware binds the suspended continuation to k. We file k back into the ready queue, and recurse to schedule the next process.

We do not invoke k here. We hand it off to the queue. The current process is now “ready, not running” — and its representation is the continuation sitting in the queue.

This is the four-line context switch. This is the whole thing. Save = Queue.push sched.ready { id = p.id; k }. Restore = Queue.pop in the next iteration plus invoke p.k (). There is no save_context, no load_context, no register-saving prologue. The continuation is the save; the invoke is the restore.

The Print clause

| Print msg k ->
    uart_write msg;
    invoke k ()

When the process performs Print msg, we write to the UART (a primitive for now; in a real kernel this would itself be perform UartTx), then immediately invoke k () to resume the same process. The process did not yield; it just made a request that the kernel serviced synchronously.

Control does not return from invoke k () to the lines after it. invoke is a tail call into the resumed process. The process continues running inside this same handle block, until it performs another effect or finishes.

The Tick clause

| Tick _ k ->
    Queue.push sched.ready { id = p.id; k };
    scheduler ()

Tick is preemption. We’ve wired up the clause to show the shape; in this single-core teaching kernel, there’s no actual timer firing yet, so Tick never arrives. The next chapter connects a simulated timer that performs Tick at intervals, and we’ll see preemption work without any other code changing.

Note the body is identical to Yield. This is the unification: preemption and cooperative yield share an implementation. The only difference is who initiated the perform.

The processes

let proc_a () =
  perform (Print "A1\n");
  perform Yield;
  perform (Print "A2\n");
  ...

A process is just a function. Nothing distinguishes proc_a from any other unit-returning function except that it performs effects. The “processhood” of proc_a comes from being spawned — being reified into a continuation and filed in the scheduler’s queue.

This is the abstraction win. User-level code is written in straight-line, sequential, blocking style. There are no callbacks, no async/await coloring, no explicit state machines. The perform Yield looks like a function call. The fact that control might transfer to a completely different process and come back later is invisible to the source code.

In Linux, code that wants to yield calls schedule() and somehow magic happens. In an async runtime, code that wants to yield uses await and the function it’s in becomes part of a state machine you didn’t write. Here, code that wants to yield writes perform Yield, which is a function call with the obvious type, and the language semantics handle the rest.

Boot

let main () =
  spawn proc_a;
  spawn proc_b;
  scheduler ()

Spawn two processes, start the scheduler. The scheduler will pop proc_a, run it until it yields, file its continuation back, pop proc_b, run it until it yields, and so on, alternating, until both processes complete.

What this kernel is missing

I want to be honest about what’s not here, because the gaps are the next chapters:

No isolation. proc_a and proc_b share the same address space, the same scheduler state, everything. A buggy process could mutate sched.ready directly. Real isolation requires capability discipline — each process has access only to the resources it’s been granted — which is part of what linear types will enable.

No real preemption. Tick is wired up but nothing fires it. Chapter 11 adds a simulated timer device that performs Tick at intervals, and we’ll see preemption work without changing the scheduler at all.

No blocking. A process that wants to wait for I/O has nowhere to go — there’s only one queue. Chapter 12 adds blocked queues keyed by what’s being waited on.

One core. Chapters 13 and 14 replicate the scheduler per-core and add migration effects.

No memory management. Processes can allocate freely from a single shared heap. The real kernel introduces per-process arenas and the bump allocator.

Print is a stub. It should itself be an effect routed to a UART driver, not a direct call. Later chapters make this proper.

But the shape is right. Every one of those additions slots into the existing structure without reworking it. New effects, new handler clauses, new queues, more scheduler instances. The unification holds throughout.

What this chapter committed to

A working cooperative kernel in 40 lines, exercising the effect-and-continuation machinery to do something real. The shape that all later chapters extend.

Take a few minutes with the code before moving on. The thing I want you to feel before we add preemption is the flatness of the design — the scheduler is one match expression and a queue, a process is a continuation and an id, a context switch is Queue.push followed by Queue.pop and an invoke. Run the execution in your head: main calls spawn proc_a, which reifies proc_a into k_a and queues it; spawn proc_b does the same; scheduler pops k_a and invokes it; proc_a runs perform (Print "A1\n") which transfers to the Print handler clause, which writes “A1” and invokes the continuation immediately, returning into proc_a at the next line; proc_a runs perform Yield which transfers to the Yield handler clause, which queues the continuation and recurses into scheduler; scheduler pops k_b and invokes it; and so on.

When you’ve felt that, the rest of Part III is about adding features without losing the flatness.

Keyboard shortcuts

Concrete Machines