The keyboard, end to end

This is a tracing chapter. We’re going to follow a single physical event — pressing the ‘A’ key — through every layer of the system, from the physical switch to the perform that delivers it to a process. The point is to make concrete how the unification described in the previous chapter actually works at the hardware level.

We have, for purposes of this chapter, a real-ish setup: a physical keyboard wired to your SoC via some serial protocol; inside the SoC, a keyboard interface device (KID) that bridges the serial protocol to the chip’s network-on-chip; the NoC itself, with four compute cores and various devices attached; and the kernel running on core 0, with a handler installed for keyboard events.

When you press ‘A’, this is what happens.

Stage 1: Outside the SoC

The physical key switch closes. The external keyboard controller’s scan loop notices, debounces (a few milliseconds), and decides to report a keypress. It encodes “A pressed” as some bytes per whatever serial protocol — let’s say a simple UART-style protocol where each keycode is one byte. It transmits those bytes to the SoC’s pin.

This is happening in real time, on the order of milliseconds, governed by the physical world and the external chip’s clock. The SoC has no direct view into it.

Stage 2: The KID receives serial bits

Inside the SoC, the KID has a serial receiver. As bits arrive on the input pin, the receiver shifts them into a register. When a complete byte has been assembled, the receiver flags it. This is a hardware-only process, happening at the bit clock rate of the external link.

The KID’s logic now has a question: what do I do with this byte?

For our simple protocol, the byte is a complete keycode. The keypress for ‘A’ arrives as the single byte 0x41. The KID is ready to forward it.

Stage 3: The KID emits a NoC packet

The KID has one byte and needs to know where to send it. Some core wants to know about keypresses. The KID can’t pick arbitrarily; it needs configuration.

This is the first interesting design decision. There are several patterns for “how does a device know who to talk to”:

Hardcoded destination at synthesis. Always sends to core 0. Simple, inflexible.
Configurable register. The kernel writes to a register at boot, naming the destination. Configurable, but requires a back-channel from kernel to device for configuration.
Service-discovery. Devices announce themselves at boot; consumers subscribe. Complex, like a real OS.

For a toy, (2) is right. The kernel, during boot, sends a configuration packet to the KID: “send all your keypresses to core 0.” The KID receives this packet, latches the destination into a register, and uses it for all future packets.

So let’s say that’s been done. The KID now has byte 0x41 and knows to send it to core 0. It constructs a NoC packet:

Field	Value	Notes
dst_id	0x00	core 0
src_id	0x05	KID’s address
op	0x01	“keypress event,” agreed protocol
flags	0x0	no priority
payload	0x00000041	keycode in 32 bits with leading zeros
reserved	0x000

It hands this packet to its NoC interface, which is the local mesh router port for the KID’s location.

Stage 4: The packet traverses the NoC

The 4-local NSEW mesh routes packets by tile coordinate. The KID is at some position; core 0 is at another. The NoC’s router logic computes a path — say, “go east twice, then north once” — and the packet hops from router to router, one mesh hop per cycle, until it arrives at the input port of core 0’s local router.

This is purely hardware. No instructions are executed during routing. The NoC is fabric, not compute. The latency is a few cycles per hop, which on a 4-local mesh is at most a handful of cycles total — small compared to anything else we’re discussing.

One important property: NoC ordering is generally guaranteed between any two endpoints. Packets sent from KID to core 0 arrive in the order they were sent. This is important for the kernel’s correctness: a sequence of keypresses can’t be reordered in flight.

Cross-endpoint ordering is not guaranteed: a packet from KID to core 0 and a packet from disk to core 0 can arrive in either order relative to each other. This is fine; the kernel doesn’t assume such ordering.

Stage 5: Core 0’s NoC interface delivers the packet

The packet arrives at core 0’s local router and is handed to the core’s NoC interface — the hardware block that bridges the NoC fabric to the core’s instruction stream. This is where the most interesting design decisions happen, because the NoC interface is the thing that has to turn a packet into a perform.

Three plausible patterns:

Memory-mapped queue. The NoC interface deposits packets into a memory-mapped FIFO. The core periodically reads from the FIFO. Traditional approach.
Interrupt on arrival. The NoC interface raises an interrupt line; the core’s interrupt handler reads the packet.
PERF on arrival. The NoC interface directly performs an effect when a packet arrives.

For this design, (3) is the answer, and this is where the elegance shows.

The NoC interface, on receiving a packet, examines the op field. It treats the op as a 4-bit value within the family 0xF (which the ISA reserves for hardware-originated effects). The interface acts as if the currently-executing code had issued PERF (family=0xF, op=0x01), <payload>, except it didn’t — the NoC interface is hardware.

The core’s pipeline, at its next preempt-sample point (end of sequencer cycle, per the precise-exception discipline), notices the signal and handles it as if the currently-executing code had issued the PERF itself:

The current instruction completes.
The continuation at the current instruction boundary is captured into a fresh CONT_REC on the heap.
The captured continuation goes into v1, the payload goes into v0, the op code goes into v7.
The handler CAM is searched for a handler matching (family=0xF, op=0x01).
Control transfers to that handler.

From the language’s perspective, this is indistinguishable from a perform issued by user code. The handler clause doesn’t know — and doesn’t care — whether the perform came from inside or from the NoC. Same calling convention, same continuation, same dispatch.

Stage 6: The handler runs

The kernel, during boot, installed a handler for (0xF, 0x01) — “keypress arrived from KID” — with the body something like:

| Hardware.key_arrival b, k ->
    match Queue.pop sched.waiters with
    | Some waiter ->
        Queue.push sched.ready { id = p.id; k };
        invoke waiter.k b
    | None ->
        Queue.push sched.inbox b;
        resume k ()

The handler does its work — files the byte in the inbox if no one is waiting, or wakes a waiting process. Then either resumes k (the continuation of whatever was running when the packet arrived) or invokes the waiter’s continuation. Either way, control flows back into user code, and the system continues.

What we’ve just built

Trace it end-to-end and notice what’s happened. A physical event — a keystroke — propagated through six layers, and arrived in the language as a perform. At no point did the kernel write code to “handle an interrupt.” It wrote a handler for an effect. The effect happens to be performed by hardware rather than by software, but the kernel’s code is identical in both cases.

This is the unification. Three observations to make it concrete:

The keyboard is structurally identical to a software process. Both produce effects that the kernel handles. The keyboard’s effects happen at hardware-determined rates; a process’s effects happen at software-determined rates. The kernel doesn’t distinguish.

The “interrupt handler” is just a handler. No separate interrupt context, no IRQ stack, no top-half/bottom-half discipline. The handler runs in the frame of whatever was preempted, has access to whatever the lexical scope provides, and resumes (or doesn’t) via the normal continuation mechanism.

Configuration is also just an effect. The “kernel sends a packet to the KID to set its destination” we glossed over — that’s also a perform. Specifically, it’s the kernel performing an effect with payload “configuration to KID at address 0x05.” The KID’s NoC-side hardware decodes this and writes to its internal destination register. There’s no special “MMIO write” instruction; it’s the same PERF mechanism going the other direction.

What’s hiding in the boot story

I waved at “the kernel, during boot, sends a configuration packet.” Let’s unwave it.

When the SoC powers up, the NoC interface on each core is in some default state. The handler CAMs are empty. The effect tables are empty. The cores are in a known state at some reset vector.

A real boot sequence:

Each core comes up at its reset vector. One core (by convention, 0x00) is the bootstrap core; the others wait.
Bootstrap core initializes its handler tower: installs the outermost handler for (0xF, *) — the catch-all for hardware-originated performs — pointing at a “panic” or “log and drop” routine, so that errant packets don’t deadlock.
Bootstrap core enumerates devices on the NoC. This is itself a sequence of performs: “send ‘identify’ to address 0x01, wait for response,” “send ‘identify’ to address 0x02, wait for response,” etc.
For each device of interest, the bootstrap core sends a configuration packet binding the device to a core. KID gets bound to core 0. UART gets bound to core 0. Etc.
The bootstrap core installs specific handlers for (0xF, *) corresponding to each device whose packets it should receive.
The bootstrap core spawns initial processes.
The bootstrap core releases the other cores, which run their own initialization sequences.

This is a real boot ROM. Maybe 200 instructions, mostly PERF to talk to devices. It’s the only code on the system that uses bare device-level PERFs rather than higher-level effects; everything afterward goes through the kernel’s abstractions.

Where multicore enters

Notice that the same mechanism — perform packets traversing the NoC — works for core-to-core communication. If core 0 wants to send a message to core 1, it issues PERF (0xF, op_for_inter_core) payload, where the payload includes core 1 as the destination. The NoC interface routes the packet; core 1’s NoC interface receives it; core 1’s handler runs.

The keyboard and the other core are the same kind of thing from the kernel’s perspective. Both are addresses on the NoC. Both communicate via packets. Both surface as performs.

We’ve built the entire mechanism for multicore already. Multicore is just “more cores, addressed the same way as devices, with the kernel running on multiple of them.” We’ll formalize this in chapter 14; for now, the point is that the design pays off across all the things you’d want to do with it.

What this chapter committed to

The full trace of an event through hardware, NoC, and into the language. Devices as effect producers. The boot story for binding devices to cores. The preview that multicore is the same mechanism extended.

The next chapter looks at the same mechanism running in the opposite direction — what does it look like when CML user code sends a message to a device?

Keyboard shortcuts

Concrete Machines