Polymorphism and the type variable

You’ve internalized that a type is a set of values with some cardinality, and that the algebra composes those sets. Now we ask: what does it mean to write a function that doesn’t care which set?

The type variable

Consider the function “give me back what you gave me.”

let id x = x

What’s its type? Well, what’s the cardinality of its input? You can’t say — it works for any input. Its type is 'a -> 'a, read “for all alpha, alpha to alpha.” The 'a is a type variable: a hole into which any concrete type can be plugged.

When you call id 42, the compiler instantiates 'a with int and the call has type int -> int. When you call id "hello", it instantiates with string. The function itself, in the source, doesn’t know which.

This is parametric polymorphism. In OO, “this works for any type” usually means “this works for any subtype of some base type” — bounded by an interface, dispatched through a vtable, paying a runtime cost. In ML, 'a -> 'a means literally any type, with no bound, and no runtime cost at all. The function is compiled once (or specialized; we’ll get to that) and works for everything.

How is that possible? Because id cannot do anything with its argument except pass it along. It doesn’t know if 'a is an int or a string or a 4KB struct. It has no operations available to it. The type variable is a promise of ignorance, and that ignorance is what makes the function universal.

Parametricity: ignorance as a guarantee

Here’s where it gets beautiful. Look at the signature 'a -> 'a and ask: what functions could possibly have this type?

Exactly one: the identity. Because the function receives an 'a and must return an 'a, and it has no way to construct an 'a from nothing (it doesn’t know what 'a is), and no way to transform the one it has (no operations are defined on an unknown type). The only 'a it can produce is the one it was given.

The signature, by itself, has told you the entire behavior. This is called a free theorem: a fact about a function deducible from its type alone, for free, without reading the implementation.

More examples:

'a -> 'b -> 'a has exactly one inhabitant (up to behavior): return the first argument. There’s nothing else to do.
'a list -> 'a list — many inhabitants, but all of them are some rearrangement/selection of the input. The function cannot invent new 'a values. It can return [], the input unchanged, the input reversed, every-other-element. It cannot fabricate elements.
'a list -> int — cardinality information only. Length, sum-of-positions, hash-of-structure. The function can examine the list’s shape but not its contents.

This is what people mean when they say “types are propositions.” The signature 'a -> 'a is the proposition “for any A, given an A I produce an A,” and the only proof is id. Stronger signatures admit fewer proofs. The art of ML design is choosing signatures so constraining that wrong implementations cannot typecheck.

Your Lisp background will fight this for a moment — Lisp’s macros let you do anything, and that power feels like the point. ML’s bet is the opposite: deliberately restrict what you’re allowed to express, so the compiler can tell you more about what you’ve written. You give up dynamism; you get certainty. For systems code, where the cost of being wrong is high and the cost of being verbose is low, the trade is usually correct.

Why this is free at runtime

How is id compiled? Two main strategies, with different tradeoffs. The ML family splits on this; production languages have made both choices.

Uniform representation. Used by stock OCaml, SML/NJ, and most BEAM-adjacent implementations. Every value is represented as a single machine word. Either it’s a small integer stored directly (with the low bit set as a tag), or it’s a pointer to a heap-allocated object whose header describes its shape. Under this scheme, id is one function, compiled once, that moves a word from one register to another. It works for int, string, (int, string, float) list option — anything — because everything is a word.

The cost is indirection. A float is heap-allocated and boxed (with exceptions for arrays of floats and other escape hatches). A pair of ints is two pointers to two boxed ints, not two ints side-by-side. You pay in cache misses and allocation pressure for the convenience of “everything is a word.”

Monomorphization. Used by MLton and (analogously) by Rust. The compiler generates a separate specialized copy of id for every type it’s called with: id_int, id_string, id_my_4kb_struct. Each one knows the exact layout of its argument and can pass it through registers or memory as appropriate. No boxing.

The cost is code size. A polymorphic function used at N types produces N copies in the binary. For deeply polymorphic code (think: a generic data structure used at twenty types), this can balloon the binary substantially. Rust mitigates this with its dyn Trait escape hatch; MLton just accepts the size.

Hybrid. Haskell with various extensions, some research MLs. Mostly uniform, with selective specialization driven by annotations or inlining heuristics.

The high-level takeaway: parametric polymorphism is a compile-time tool with one of two runtime stories, and both are predictable. Unlike OO subtype polymorphism — where dispatch goes through a vtable at every call and the optimizer often can’t see through it — parametric calls are either monomorphized (and inlined like normal C) or uniformly represented (and the indirection is uniform and predictable, not per-call). You can reason about it.

For high-performance systems work, the practical heuristic: polymorphism in hot loops is cheap on monomorphizing implementations and possibly expensive on uniform-representation ones, depending on what’s being passed. If you’re processing a tight inner loop over 'a array where 'a = float, on OCaml you’ll want float array specifically (which OCaml special-cases to unboxed storage), not a generic polymorphic container.

The book’s CML defers this decision; the compiler team can pick monomorphization or uniform representation when it has more information about typical workloads. The book’s hardware design (the 3-bit tag, the verified-tagged ALU) is somewhat hostile to monomorphization, since it carries type information everywhere by construction. So the natural choice is uniform representation with type tags, with monomorphization as an optimization for hot loops where the compiler can prove it pays off.

Polymorphism vs. genericity vs. interfaces

A clarifying triangulation, since you have intuitions from several languages:

Go generics (post-1.18) are constrained parametric polymorphism with monomorphization or dictionary-passing, depending on the compiler’s mood. Closer to ML than to Java.

Java/C# generics are constrained parametric polymorphism with uniform representation (Java erases to Object; C# specializes for value types but not reference types). Closer to OCaml than Java programmers usually realize, but bounded by interfaces, which ML’s 'a is not.

C++ templates are monomorphization, plus duck typing. The compiler doesn’t check whether T has a + operator until you instantiate; ML’s type variables are checked once at the definition site, which is why ML compile errors are local and C++ template errors are not.

Elixir/Erlang don’t have static parametric polymorphism — everything is dynamically typed and the “polymorphism” is just “this function happens to work on whatever you pass it.” The discipline you cultivated with @spec and Dialyzer is approximately ML’s discipline, applied retroactively.

Lisp generics (CLOS, Clojure protocols) are ad-hoc polymorphism with runtime dispatch — closer to OO than to ML. ML’s parametric polymorphism is a different axis entirely; you’ll see ML programmers reach for it where a Lisper would reach for a multimethod, and the ML version is statically checked but less extensible.

Where bounded polymorphism lives in ML

You will eventually want to say “this function works for any 'a, as long as 'a can be compared / hashed / serialized.” That’s where ML’s module system enters, and it’s so different from interface-based bounded polymorphism that it would deserve its own book.

Briefly: the answer is not “type classes” (that’s Haskell) and not “interfaces” (that’s everyone else). The answer is: you pass in a module that provides the required operations, either explicitly or via a functor. The compiler resolves it at compile time, monomorphizes or doesn’t depending on implementation, and you pay nothing at the call site.

CML in this book doesn’t fully exploit the module system; the kernel and example programs are simple enough that they don’t need it. A real CML would grow modules and functors over time. For now, the parametric polymorphism described here is the whole story.

A small exercise

How many distinct functions of type bool -> bool -> bool are there? Compute it from cardinalities, then enumerate them in your head and check.

What this chapter committed to

Parametric polymorphism as a compile-time discipline of ignorance. Free theorems as the payoff: the type alone constrains the function dramatically. Two implementation strategies (uniform representation and monomorphization), both predictable in cost. Brief comparisons to neighboring languages. The hint that ML has a richer module-based bounded-polymorphism story that this book mostly defers.

Part I is now complete. You have the algebraic vocabulary, the control-flow primitive, the propositional view of types, the orientation choice, and parametric polymorphism. The next part — Part II — applies all of this to an ISA: what does it mean to design hardware that thinks in these terms?

Keyboard shortcuts

Concrete Machines