What does it mean for the ALU to type-check?

Part I gave you the algebra. Part II takes the algebra to the metal. We’re going to build a 16-bit ISA whose primitives are shaped by the language they’ll run. The first chapter of this part starts with the question of where types live: in the source code, in the compiler’s internal records, or in the running machine itself.

The ISA we’ll design carries types at runtime. Every value in a register has a 3-bit tag identifying its type. The ALU checks tags before operating. This is a real design decision with consequences. Before committing to it, let’s lay out the alternatives.

Four alternatives

Alternative A: Tags are dynamic; the ALU dispatches on them

Every value carries a tag. The ADD instruction inspects both operands’ tags. If they’re both int32, it does int32 addition. If they’re both float, it does float addition. If they’re mismatched, it raises a hardware fault.

This is the Lisp Machine design. The Symbolics 3600 and the MIT CADR did exactly this. It makes dynamic typing nearly free: the tag check that an interpreter would do in software, the hardware does in a single cycle alongside the arithmetic.

Buys: dynamic dispatch is cheap. Polymorphism without monomorphization is viable. Garbage collection is helped enormously — the GC can scan a register and know from its tag whether to follow it as a pointer. A “value” type that covers everything is possible.

Costs: The type system, if you have one, is redundant with the hardware tags. You’re paying for tag bits at runtime that the compiler could have proven unnecessary. Monomorphization stops making sense, because you can’t erase what the hardware demands be present.

Alternative B: Tags are static; the ALU dispatches at decode

The tag isn’t carried in the value at runtime. Instead, the instruction encoding names the type: ADD.i32, ADD.i64, ADD.f32 are distinct opcodes. The ALU’s “type checking” is really the decoder’s job — it picks the right execution unit based on the opcode.

This is what x86, ARM, RISC-V, and basically every modern ISA actually do.

Buys: monomorphization is the natural fit. Values in registers are raw bits with no overhead. This is the fastest design for statically-typed code.

Costs: Dynamic dispatch becomes expensive again — you’re back to a software tag check and a branch. The hardware isn’t really doing anything ML-specific; you’ve described a normal ISA.

Alternative C: Tags are static AND carried; the ALU verifies

Each value carries its tag at runtime, but the instruction encoding also specifies the expected type. ADD.i32 reads two registers, checks both tags are i32, performs the add, writes a result with tag i32. A tag mismatch is a hardware fault — but it’s a fault that, in a well-typed program, cannot occur. The hardware check is a verification of the compiler’s claim, not a dispatch mechanism.

This is closer to what you’d get if you pushed CHERI’s capability discipline further.

Buys: The type system and the hardware tags are in correspondence. A well-typed program never faults on a type check, so the type system is a soundness proof for the program’s execution. Dynamic typing is still possible (using a sum type whose tag is inspected explicitly), but it’s opt-in. The hardware tag also gives the GC a precise marking primitive, and lets cross-device messages carry their type information natively over the NoC.

Costs: You’re paying for tag bits in every value, which costs memory bandwidth and register width. You’re paying for the tag check in every ALU operation, which is parallel-with-the-op so latency-free but costs energy and silicon.

Alternative D: Tags are carried, but only at boundaries

Inside a function body, values are raw — the compiler proved the types, no tags needed. At function call boundaries, message-passing boundaries, and effect-perform boundaries, values get tagged. This is a hybrid: register-resident values are bare bits within a “tagged region,” and crossing a boundary either attaches or strips a tag.

Buys: You get C’s register efficiency in hot loops, plus the cross-boundary verifiability of Alternative C. Cross-device messages over the NoC arrive tagged.

Costs: Compiler complexity: tracking “tagged” vs. “bare” representations and inserting tag/untag operations at boundaries. The inside of a function is no longer verified by the hardware, so the type system has to be trusted there. For a research ISA where you’re co-designing for safety, this is a regression.

The intuition pump

Picture a single ADD of two int32s. Walk through what happens in each alternative:

A (Lisp Machine): Hardware reads both registers, sees tags say “int32,” performs integer add, writes result with int32 tag. Took one cycle, with the tag check parallel to the add. Same instruction would have done float add if the tags said float.
B (modern ISA): Decoder sees ADD.i32, routes to the integer ALU. ALU adds. Done. No tags anywhere.
C (verified-tagged): Decoder sees ADD.i32. Hardware reads both registers, checks both tags are i32 (else fault), performs add, writes result with i32 tag. Same mechanism as A, different semantics: the tag is verification, not dispatch.
D (boundary-tagged): Inside a function, looks like B. At a call site that crosses an ABI boundary, the compiler emits TAG.i32 to attach a tag before the call, and UNTAG.i32 to strip it on receipt.

Now picture perform (SendToDevice 0x42 payload). In A and C, the payload carries its tag with it and the receiving device can interpret it. In B, the payload is bare bits and the protocol has to encode the type somehow. In D, you’d tag at the perform boundary and untag at the handler.

If the NoC is going to carry typed messages between heterogeneous devices, you almost certainly want the type to be physically present in the message, not implicit in the protocol. That’s a tag in the value. This argues strongly for C.

The decision

For this book’s ISA, we choose Alternative C. The decision is principally driven by:

Cross-device messages over the NoC need to carry type information. The hardware tag does this naturally.
The type system gives a precise GC root-set without compiler-annotated stack maps.
The hardware tag check turns the type system from “compile-time discipline” into “compile-time discipline backed by runtime verification.” Type unsoundness manifests as a hardware fault, not as silent corruption.
Effect dispatch, which we’ll meet in chapter 7, benefits from the same tag mechanism — continuations are tag-distinguished from other pointers.

The costs (register width, ALU tag-check silicon) are acceptable for a teaching ISA, and arguably acceptable for a small production ISA if the design holds together at scale.

What this implies for the language

Several things follow from the choice:

The language has a notion of “primitive type” that maps to a hardware tag. There’s a small, fixed set: integers of various widths, floats (if any), pointers, continuation handles. The current 3-bit tag space holds 8 such types.

Compound types are built out of primitives. Sums, products, functions. These need representation strategies; we’ll get there in later chapters. The short answer: they’re heap-allocated and pointed to via a PTR-tagged value.

Monomorphization is possible but less crucial. Because hardware tag dispatch is cheap, polymorphism is cheap. You can still monomorphize for code-quality reasons, but you’re not forced to.

The type system has a soundness theorem of the form “well-typed programs do not fault on hardware tag checks.” This is a real, provable property that the language must preserve. We get to choose the language and the type system; we can make this hold.

Sum types and hardware tags overlap conceptually but not in implementation. We thought, briefly, about making sum-type variants share the hardware tag namespace directly. The decision (carried forward into the ISA spec) is to keep them separate: hardware tags identify primitive types, and sum-type variants are recorded in heap-object headers. We’ll explore this in the language spec; for now, hold the seeds in mind.

Consequences for tag width

How wide should the tag be? Three bits gives 8 tags, which is the choice in this book’s ISA. The allocation:

Tag	Name
0	INT
1	PTR
2	CONT
3	CLOSURE
4	UNIT
5	BOOL
6	RESERVED
7	RESERVED

INT is the only numeric type at present — 16-bit signed integer. PTR is a heap pointer. CONT is a continuation handle (a heap pointer, distinguished for GC and for RESU typechecking). CLOSURE is a closure record (also a heap pointer, distinguished for CALL). UNIT and BOOL are special-cased so they fit in registers without heap allocation. Two slots are reserved for future expansion.

Three bits is tight. We can’t add FLOAT, INT64, BYTE as distinct primitive types without either expanding the tag or remapping. The decision for the 16-bit toy is to live with the constraint — there’s plenty of design space in just INT, PTR, CONT, CLOSURE, UNIT, BOOL to demonstrate the model. A 32-bit successor would widen the tag to 4 or 5 bits and allocate the additional primitives.

What this chapter committed to

Verified-tagged architecture as the design choice. The 3-bit tag namespace. The principle that the type system and the hardware tags are in correspondence, with the type system as a soundness proof for hardware tag integrity.

The next chapter introduces the operation that makes this ISA strange: perform, the instruction that captures a continuation and transfers to a handler.

Keyboard shortcuts

Concrete Machines