← Home
petri-net dsl domain-driven-design little-languages composition state-machine formal-methods forth tla-plus

The Little Language Thesis

Jon Bentley argued that tiny purpose-built languages — make, awk, pic — outperform general-purpose ones for well-scoped problems. Eric Evans argued that developers and domain experts should share a single vocabulary — what he called "ubiquitous language." Both were pointing at the same thing: the best tool for a domain is a language that speaks the domain's own words. But both missed a subtlety: in the best systems, the structural language and the domain language are different layers. The labels make the model readable to humans. The primitives make it analyzable by machines.

Complexity You Can Hear

The beats sequencer is the clearest demonstration of what this two-layer architecture looks like in practice.

The cells are kick_0, snare_3, hihat_5 — positions in a rhythm ring. The funcs are trigger, mute, unmute. The labels read like a drum machine manual. A musician doesn't learn Petri net theory — they read the labels and hear the model. And because the sequencer turns token flow into sound, a wrong model doesn't just fail a test — you can hear it.

The musician works entirely in domain vocabulary: instruments, patterns, triggers. Underneath, four structural primitives — cell, func, arrow, guard — give those labels formal meaning. The tools analyze for deadlocks, conservation, reachability. The musician never sees the primitives. They hear the domain.

The word "composition" is doing triple duty. The incidence matrix operates over integers — token accumulation is additive, transition firing is multiplicative. That's algebraic composition. The nets themselves compose as morphisms in a symmetric monoidal category, wiring outputs to inputs. That's categorical composition. And the control nets that wire kick ⊗ snare ⊗ hihat into an arrangement — that's musical composition. Same four words at every level. The language doesn't strain.

Three Meanings of Composition

The Forth Precedent

Forth, Charles Moore's stack-based language from the late 1960s, discovered this two-layer architecture first.

Forth programmers don't write applications — they build up domain vocabularies. A Forth program for controlling a telescope doesn't look like Forth. It looks like telescope commands. The stack primitives (DUP, SWAP, DROP, OVER) are the structural layer. Nobody talks about them. They're the invisible substrate. What people read and speak are the domain words built on top: SLEW, TRACK, CALIBRATE.

Moore was practicing ubiquitous language decades before Evans gave it a name. The primitives are for the machine. The words are for the domain. You don't see DUP SWAP in a finished Forth application any more than a musician sees cell func arrow while sequencing a drum pattern. The primitives are scaffolding. The labels are the building.

But Forth is Turing-complete. You can build anything — which means you can't prove much. What if you kept the pattern but gave up the power?

The Structural Substrate

The token language that drives our Petri net tools has exactly four structural primitives:

Primitive Role Structure
cell a place that holds things state container
func an action that changes things state transition
arrow a connection with direction flow and dependency
guard a condition that must be true constraint

That's the whole structural language. There is no fifth primitive. (arrow is our name for what Petri net theory calls an arc — the same concept, friendlier.)

These four terms are not the ubiquitous language. They're the encoding — the substrate on which domain languages get built. Nobody walks into a meeting and says "we need a new func." They say "we need a transfer operation" or "add a mute control."

The ubiquitous language lives in the labels. In an ERC-20 token model: the cells are balances, totalSupply, allowances. The funcs are transfer, mint, burn. A financial engineer and a Solidity developer already share these words. They don't need to know what a "cell" is — they need to know that balances feeds into transfer, guarded by sufficient funds. That sentence is the ubiquitous language. The engineer never mentions the word "cell."

Two Traditions, Two Layers

Forth Petri Net DSL
Structural primitives DUP, SWAP, DROP, OVER cell, func, arrow, guard
Domain vocabulary User-defined words (SLEW, TRACK) Labels on places and transitions (balances, transfer)
Visible state The stack The marking (tokens in places)
Composition Word concatenation Net wiring (shared places, control nets)
What you ship A domain vocabulary A labeled net

Each model constructs a domain vocabulary on top of a universal grammar. The structural grammar stays fixed. The domain language is whatever the domain already speaks.

The Incidence Matrix

Bentley's little languages work because they're closed — you can't escape into general-purpose complexity. Regular expressions don't have loops. Makefiles don't have recursion. The constraint is the feature.

Every Petri net has an incidence matrix: rows are places (cell), columns are transitions (func), entries are arc weights (arrow), guards add row constraints. The entire behavior of any model is captured in a matrix of integers.

Integer matrices are things mathematicians know how to analyze. P-invariants fall out of the null space — conservation laws proved by structure alone, no model checker needed. ODE simulation works because the matrix defines a system of differential equations (continuous relaxation). Reachability and deadlock detection work because guards and finite token counts keep the state space bounded.

What we give up is the point: no Turing completeness, no unbounded recursion, no implicit state. A fifth primitive that added any of these would collapse the matrix into a Turing-complete system — and with it, every guarantee. Add one primitive and analysis breaks. Remove one and you can't express real workflows.

Why a Substrate Matters

Without a structural substrate, labels have no formal meaning. Write a state machine in Go and transfer is just a function name — you can't ask "is this deadlock-free?" without writing a model checker. Write it on the four-term substrate and transfer becomes a transition with input arcs, output arcs, and guards that tools can reason about. Same vocabulary, now analyzable.

This is also why Evans' ubiquitous language often fails in practice. Teams agree on shared vocabulary, write it into Go or Java, and the vocabulary diverges because the programming language imposes its own structure. The domain words get buried in implementation noise. The four-term substrate has no vocabulary of its own to compete with them.

TLA+ and the Two-Layer Split

TLA+ is the most serious alternative here. Amazon used it to find subtle bugs in S3 and DynamoDB that testing couldn't reach. But TLA+ conflates the two layers — its structural primitives are the vocabulary. Domain words like balances and sender are embedded in set theory and temporal logic (\E x \in S : P(x), primed variables, EXCEPT notation). A financial engineer won't read a TLA+ spec and see their workflow.

The tradeoff is real: we give up TLA+'s generality — no arbitrary temporal properties, no fairness specs. But for properties that fall out of net structure (conservation, reachability, deadlock freedom, boundedness), the four-term DSL delivers the same guarantees with a far lower barrier to entry. A domain expert names places and transitions in their own words. The analysis happens underneath, on the incidence matrix, without them knowing it exists.

The Test

Can a domain expert label the places and transitions, and can a developer read those labels as a spec?

Token standards. "The balances place feeds into transfer, guarded by sufficient funds." The ERC-20 schema is seven lines — simultaneously a model, a spec, and an executable test.

Games. A designer names board positions and legal moves. In tic-tac-toe: p0 through p8, play_X_0, play_O_4. The developer wires the arrows.

Music. A musician names kick, snare, hihat and connects them with trigger, mute, unmute. The beats sequencer produces audible output — if the model is wrong, you hear it.

The Thesis

The ubiquitous language for any domain emerges from labeling the places and transitions of a Petri net in that domain's own words. The four structural primitives are the minimal substrate that makes those labels analyzable, composable, and executable.

For the class of problems that involve "things in states, changing according to rules" — the pattern holds: a minimal structural grammar, a domain-specific vocabulary, and a clean separation between the two.

Further Reading


Hypothesis: four structural primitives are sufficient for any domain. Experiments: music, finance, games, token standards. Falsifiability: (1) can a domain expert label the parts, and can a developer read the labels as a spec? (2) do composed nets commute? (3) does the incidence matrix recover all behavioral properties? If any test fails, the thesis is dead. So far, none have.

×

Follow on Mastodon