Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Cardinality and the operators

Forget OCaml exists for a few pages. The ML family — SML, OCaml, F#, and the more distant cousins Haskell and Rust — share an algebraic core. Before we name the operators, we need one piece of vocabulary.

Cardinality

Cardinality of a type is the count of distinct values it has. That’s it. It’s the same word mathematicians use for the size of a set, applied to types-as-sets-of-values.

Some examples to anchor:

  • bool has cardinality 2. Its values are true and false.
  • unit has cardinality 1. Its only value is ().
  • A type with no values — the empty/void type — has cardinality 0.
  • int32 has cardinality 2³² ≈ 4.3 billion.
  • string has cardinality ℵ₀ (countably infinite, ignoring memory limits) — there are infinitely many possible strings.
  • A custom type type traffic_light = Red | Yellow | Green has cardinality 3.

When I say a type “has N values,” I don’t mean “holds N values at runtime” the way a list holds elements. I mean: across all possible programs and all possible runtime states, the set of legal inhabitants of this type has N members. bool has cardinality 2 because there exist exactly two booleans in the universe. A variable of type bool is, at any moment, one of them.

Why this matters, and why this book will keep returning to it: cardinality is the link between the algebra and the discipline. The phrase “make illegal states unrepresentable” cashes out as: choose a type whose cardinality equals the number of legal states in your domain. If your domain has 7 legal states and your type has cardinality 12, you have 5 representable illegal states — bugs waiting to be written. If your type has cardinality 5, you can’t represent two of your legal states — bugs waiting to be discovered. Cardinality is how you count whether your type is the right shape.

It’s also why the operator names — product, sum, exponential — aren’t metaphors. They’re literal. The cardinality of a pair really is the product of the cardinalities. The cardinality of a function type really is an exponential. The algebra of types is the algebra of finite sets, lifted.

The operators

Product types. A pair (A, B) is the type whose values are one A and one B. Cardinality: |A| × |B|.

Verify with bool: (bool, bool) has 4 values — (true, true), (true, false), (false, true), (false, false). 2 × 2 = 4. ✓

You know this from tuples, structs, records. Nothing new yet.

Sum types. A choice A + B is the type whose values are either an A or a B, tagged. Cardinality: |A| + |B|.

Verify: bool + bool has 4 values — Left true, Left false, Right true, Right false. 2 + 2 = 4. The tag matters; Left true and Right true are distinct values of the sum type even though they share the underlying true.

This is the operator your C and Go intuition doesn’t have a clean slot for. Go fakes it with interfaces; C fakes it with tagged unions you maintain by hand; TypeScript fakes it with discriminated unions and prays. In ML, it’s primitive and the compiler enforces that you handled every tag.

Function types. A -> B is the type of something that, given an A, yields a B. Cardinality: |B|^|A|.

Verify: bool -> bool has 4 values — the identity function, negation, constant-true, constant-false. 2² = 4. ✓ Each function is determined by its choice of output for each of the |A| possible inputs, hence |B| choices made |A| times.

Yes, exponential. This will matter — it’s why higher-order functions are so expressive: a single function value can encode an astronomical amount of behavior.

The unit type. unit has exactly one value, (). Cardinality 1. It’s the multiplicative identity: A × unit ≅ A, because |A| × 1 = |A|. You can think of it as “no information” — a value of type unit tells you nothing you didn’t already know, because there was only one possibility. That’s exactly what void should have meant in C.

The empty type. A type with zero values. Cardinality 0. It’s the additive identity: A + empty ≅ A, because |A| + 0 = |A|. You’ll rarely write a value of this type — you can’t, there aren’t any — but it shows up as the return type of functions that don’t return (exit, infinite loops, raising exceptions in pure settings). A function of type A -> empty is one that takes an A and cannot possibly return normally, and the type system knows it.

Algebraic identities

Now the move. Identities you learned in middle school just work on types, because they’re literally the same identities over the same kind of arithmetic:

  • A × (B + C) ≅ (A × B) + (A × C) — distribution. A pair of an A with either-a-B-or-a-C is the same as either-(a-pair-of-A-and-B)-or-(a-pair-of-A-and-C). Cardinalities: |A|·(|B|+|C|) = |A|·|B| + |A|·|C|. ✓
  • A^(B+C) ≅ A^B × A^C — a function from a sum is the same as a pair of functions, one per case. This is what pattern matching is. Cardinalities: |A|^(|B|+|C|) = |A|^|B| · |A|^|C|. ✓
  • (A^B)^C ≅ A^(B×C) — currying. A function returning a function is the same as a function of two arguments. Cardinalities: (|A|^|B|)^|C| = |A|^(|B|·|C|). ✓

The ≅ symbol is “isomorphic to” — there’s a lossless, invertible translation between values of the two types. Not “equal,” because the types are syntactically distinct, but interchangeable in any way that matters. When you see two ML programmers debating whether to use a pair of options or an option of a pair, this is what they’re arguing about: which side of an isomorphism is more readable, since they’re operationally the same.

When an ML programmer says “make illegal states unrepresentable,” they mean: use the algebra above to construct a type whose cardinality equals exactly the number of legal states in your domain. No more, no fewer. The whole discipline is counting.

Systems lens, first pass

You’re going to want to know: does this algebra cost anything at runtime? Mostly no. A product is a struct. A sum is a tagged union — typically one word of tag, plus enough space for the largest variant’s payload (with caveats per implementation we’ll cover when we get to the ISA). A function is a code pointer plus a closure environment. The algebra is mostly a compile-time bookkeeping discipline with predictable runtime shapes. The portals are free; what’s behind them costs what you’d expect.

There’s one cost worth flagging now: sums force a tag check at every match site. On a modern branch predictor with a closed enumeration of cases, this is essentially free in steady-state hot loops. But if you’re writing the kind of code where a mispredicted branch matters — packet processing inner loops, DSP — you’ll want to know which sums in your hot path compile to what. The ISA chapters will give you the tools to find out.

What this chapter committed to

A vocabulary. Type, value, cardinality. Four operators: product, sum, function, unit-and-empty. The algebraic identities that connect them. The principle that ML’s design discipline reduces to counting. The hint that this algebra will have a direct, surprisingly cheap, hardware story.

The next chapter introduces the operation that consumes sums for breakfast: pattern matching.