Tropical Petri Nets

There is a surprising connection hiding between three fields that rarely talk to each other: Petri nets, neural networks, and an exotic branch of algebra where "addition" means "pick the bigger one." The connection isn't metaphorical — it's a theorem, and it reframes what neural networks actually compute.

A Different Arithmetic

What if we redefined addition? In tropical arithmetic, the sum of two numbers is their maximum, and the product of two numbers is their ordinary sum:

Classical vs Tropical Arithmetic

This isn't a toy. The tropical semiring (ℝ ∪ {−∞}, max, +) satisfies the same algebraic laws as ordinary arithmetic — associativity, commutativity, distributivity — just with different operations filling the same roles. It's a full-fledged algebra, and it shows up everywhere that optimization meets discrete structure: shortest paths, scheduling, phylogenetics, and — as we'll see — both Petri nets and neural networks.

Petri Nets Speak Tropical

Consider a timed Petri net: each place holds tokens, each transition fires when all its input places are ready. The key word is all. A transition with three inputs doesn't fire when the first token arrives, or the second — it waits for the last one. The firing time is the maximum of the arrival times.

That max is tropical addition.

After firing, a transition deposits tokens in output places. The arrival time at an output place is the firing time plus the transition's duration. That + is tropical multiplication.

The entire behavior of a timed event graph (a Petri net where every place has exactly one input and one output transition) can be written as a tropical matrix equation:

x(k+1) = A ⊗ x(k)

where A is the incidence matrix interpreted over the tropical semiring, ⊗ is tropical matrix multiplication, and x(k) is the vector of firing times at step k. The net's dynamics become linear — not over ordinary arithmetic, but over (max, +).

This is the core result of Baccelli et al., Synchronization and Linearity (1992): timed event graphs are max-plus linear systems. Every tool from linear algebra — eigenvalues, eigenvectors, projections — has a tropical counterpart, and it applies directly to Petri net analysis.

ReLU Is Tropical

Now consider the most common activation function in deep learning:

ReLU(x) = max(0, x)

That max is, again, tropical addition. ReLU(x) is a tropical polynomial in one variable.

This observation extends far beyond a single neuron. Zhang, Naitzat, and Lim (ICML 2018) proved that every feedforward ReLU network computes a tropical rational function — a ratio of tropical polynomials. The decision boundaries of a ReLU classifier are the vertices of a tropical hypersurface. This isn't a loose analogy. It's a precise algebraic characterization of what these networks compute.

Neural networks have been doing tropical arithmetic all along. They just didn't know it.

The Triangle

Three fields. Three pairwise connections. One unifying algebra.

The Tropical Triangle

Petri nets → tropical algebra: Timed event graphs are max-plus linear systems (Baccelli et al. 1992). More recently, Hameed et al. (MDPI 2025) showed that a max-plus perceptron is a Petri net — the connection runs in both directions.
Neural networks → tropical algebra: ReLU networks compute tropical rational functions (Zhang et al. 2018). The network's architecture determines the Newton polytope of its tropical polynomial.
Petri nets ↔ neural networks: One is structured, the other is learned. Tropical algebra reveals they're computing over the same semiring — Petri nets as the structured case, neural networks as the unstructured case.

The Hierarchy: Structure vs. Parameters

This is the central point. We can arrange computational models along a spectrum, and tropical algebra reveals what the spectrum actually measures:

Petri net: We know the causal structure. We write it down as a net. Arc weights are small integers. Computation is tropical-linear — a matrix-vector multiply over (max, +). Analysis is polynomial time. The model is interpretable because the structure is the explanation.

Recurrent neural network: We don't know the structure. We learn it via backpropagation. The weights are floats approximating what integers would capture exactly. A dense weight matrix compensates for missing topology. The computation is tropical-rational — still tropical, but without the structural guarantees that make analysis tractable.

Large language model: We don't even know the task. We massively overparameterize — billions of weights trained on internet-scale data. Those billions of parameters approximate what a small structured model could compute exactly, if we knew what model to build.

Each step down the hierarchy trades domain knowledge for parameters. Tropical algebra makes this precise: Petri nets are tropical-linear (polynomial-time analysis), RNNs are tropical-rational (still tropical, but the structure must be inferred), and LLMs are... overkill for any problem with known causal structure.

What DDM Already Does

If this framing sounds familiar, it's because Declarative Differential Models (DDM) and integer reduction already exploit the same insight from a different angle.

DDM's continuous relaxation converts a Petri net's discrete dynamics into ODEs. The system flows to an equilibrium that reveals structural quantities — the token distribution x* = 1/n, the conservation laws, the reachable states — without enumerating the state space. Integer reduction then recovers exact discrete structure from these continuous trajectories.

Tropical analysis does something strikingly similar. The tropical eigenvalue of a max-plus matrix (the maximum circuit mean) tells us the throughput of a timed Petri net — a global structural quantity extracted from local topology, no state-space search required. DDM's ODE equilibrium and the tropical eigenvalue are siblings: both extract invariants from structure rather than from exhaustive search.

The categorical perspective makes the kinship precise. Petri nets are morphisms in a symmetric monoidal category. Tropical algebra gives us a valuation on that category — a way to measure distances and optimize over the space of nets. DDM gives us a continuous relaxation. All three are tools for extracting meaning from structure.

Implications

If you know your domain, build a Petri net, not a neural network. A 10-node Petri net analyzed over the tropical semiring will outperform a 10,000-parameter RNN on any task where the causal structure is known. Not approximately — exactly. This is the small models thesis.
The Lottery Ticket Hypothesis (Frankle & Carlin, 2018): Dense networks contain sparse subnetworks that match the full network's performance. Tropical algebra suggests why — those sparse subnetworks may have Petri-net-like topology. The "winning ticket" is the tropical-linear substructure hiding inside a tropical-rational mess.
Tropical compression (Fotopoulos et al., TropNNC 2024): Simplify a trained network by reducing its tropical polynomial. This is the inverse of what Petri nets do by starting structured. Start with a bloated network, extract its tropical skeleton, arrive at something that looks like... a Petri net.
Mechanistic interpretability, reframed: The question "what does this network compute?" becomes "does this network implement a Petri net?" If the answer is yes — if the network's tropical polynomial factors into a sparse, integer-weighted structure — then we've recovered the causal model the network learned to approximate.

Open Questions

We'll close with three threads that seem worth pulling:

Tropical decomposition of trained networks. Can we extract Petri net structure from a trained ReLU network by factoring its tropical polynomial? TropNNC compresses networks but doesn't (yet) produce Petri nets. The gap between "simplified tropical polynomial" and "Petri net with named places and transitions" is a gap of interpretation, not algebra.

Tropical DDM. DDM uses ODE integration to find continuous equilibria. Could we replace this with tropical polynomial solving — finding the tropical eigenvalue directly from the incidence matrix? This would unify the ODE approach with the max-plus approach, and might be faster for large nets.

Maslov dequantization. The tropical semiring is the classical limit of quantum algebra — as Planck's constant goes to zero, the path integral concentrates on the classical path, and quantum superposition becomes classical optimization (max). This is Maslov's observation. Petri nets already have a discrete/continuous duality (DDM exploits it). Does the tropical connection extend this to a discrete/continuous/quantum triad?

The core insight is simple enough to fit on a napkin: Petri nets and ReLU networks both compute over (max, +). Everything else follows from taking that seriously.

For how tropical analysis combines with ODE and zero-knowledge proofs — three independent formalisms discovering the same structural boundary: Earned Compression

For the tropical semiring as the left context of a zipper — past accumulation meeting predicate future at the marking boundary: The Zipper Whose Hole Is a Universe