game tic-tac-toe

Tic-Tac-Toe Petri-Net Model

Overview

This post walks through two complementary Petri-net formulations of tic-tac-toe, progressing from a minimal turn-based grid to a memory-augmented model capable of reasoning about win conditions. These examples demonstrate how Petri nets naturally encode available actions, enforce alternation, and accumulate structured history for higher-level reasoning.

The final sections outline how these models can be converted into ODE systems via go-pflow and how to apply parameter sweeps to explore strategy, reachability, and path likelihoods.

Stage 1 — Grid & Turns

In the first stage, we represent the tic-tac-toe grid as nine places P00P22, each initialized with a token indicating an empty cell.

For every square we define:

A dedicated Next place enforces turn-taking:

This structure cleanly separates:

  1. Geometry
  2. Legal actions
  3. Alternating turn flow

All purely through token movement and local composability—no global logic required.

Model Diagram (Stage 1)

pflow

Stage 2 — Adding History

The second version keeps the same grid & turn logic but introduces a memory layer.

Whenever a move fires (e.g. X00):

Similarly, all O moves populate _Oij history places.

This upgrades the model from state of the board to sequence of moves. Now we can reason about why the board reached its state—not just its current configuration.

Model Diagram (Stage 2)

pflow

Pattern Collectors & Win Detection

Once we have history places, we can build pattern collectors—transitions that listen for combinations like:

A pattern collector consumes history tokens from the three required squares, producing a structural token such as:

Finally, we add win detection transitions:

This creates a fully compositional win-detection system without any global conditional logic.

pflow

Notice above how a winning set of moves activates a win detection transition that populates the win_x place.

ODE-Guided Move Selection

Using go-pflow, we can convert this Petri net into an ODE system and simulate forward to predict outcomes. The key insight: win place token accumulation predicts game outcomes.

For each possible move, we:

  1. Create a hypothetical state after making that move
  2. Run ODE simulation (mass-action kinetics, t=0 to t=3.0)
  3. Measure final values of win_x and win_o places
  4. Score = my_win − opponent_win

Example 1: Empty Board

X evaluates all 9 positions. The ODE reveals strategic value through model topology alone:

  Board:              Score Heatmap (X's view):
╔═══╦═══╦═══╗         ╔═══════╦═══════╦═══════╗
║ 0 ║ 1 ║ 2 ║         ║ 0.316 ║ 0.218 ║ 0.316 ║
╠═══╬═══╬═══╣         ╠═══════╬═══════╬═══════╣
║ 3 ║ 4 ║ 5 ║   =>    ║ 0.218 ║ 0.430 ║ 0.218 ║
╠═══╬═══╬═══╣         ╠═══════╬═══════╬═══════╣
║ 6 ║ 7 ║ 8 ║         ║ 0.316 ║ 0.218 ║ 0.316 ║
╚═══╩═══╩═══╝         ╚═══════╩═══════╩═══════╝
                            Best: center (4)

The center position scores highest (0.430) because it participates in 4 winning patterns (2 diagonals + row + column). Corners score next (0.316, 3 patterns each). Edges score lowest (0.218, 2 patterns each).

No game heuristics are coded. Strategic value emerges purely from the Petri net's structure.

Example 2: Responding to Center

After X takes center, O evaluates defensive options:

  Board:              Score Heatmap (O's view):
╔═══╦═══╦═══╗         ╔════════╦════════╦════════╗
║ 0 ║ 1 ║ 2 ║         ║ -0.375 ║ -0.469 ║ -0.375 ║
╠═══╬═══╬═══╣         ╠════════╬════════╬════════╣
║ 3 ║ X ║ 5 ║   =>    ║ -0.469 ║   X    ║ -0.469 ║
╠═══╬═══╬═══╣         ╠════════╬════════╬════════╣
║ 6 ║ 7 ║ 8 ║         ║ -0.375 ║ -0.469 ║ -0.375 ║
╚═══╩═══╩═══╝         ╚════════╩════════╩════════╝
                            Best: any corner

All scores are negative (X has the advantage), but corners (-0.375) minimize X's lead better than edges (-0.469). The ODE naturally discovers the defensive principle: corners against center.

Example 3: Finding the Winning Line

X has center and corner. The ODE identifies the diagonal threat:

  Board:              Score Heatmap (X's view):
╔═══╦═══╦═══╗         ╔═══════╦═══════╦═══════╗
║ O ║ 1 ║ 2 ║         ║   O   ║ 0.345 ║ 0.522 ║  <- 02 threatens diagonal
╠═══╬═══╬═══╣         ╠═══════╬═══════╬═══════╣
║ 3 ║ X ║ 5 ║   =>    ║ 0.345 ║   X   ║ 0.339 ║
╠═══╬═══╬═══╣         ╠═══════╬═══════╬═══════╣
║ 6 ║ 7 ║ 8 ║         ║ 0.522 ║ 0.339 ║ 0.361 ║  <- 20 also high
╚═══╩═══╩═══╝         ╚═══════╩═══════╩═══════╝
                            Best: 02 or 20 (0.522)

Position 02 and 20 score highest (0.522) because they create two-way threats: each completes one diagonal while opening another winning path. The model's pattern collectors naturally amplify these fork positions.

Example 4: Must-Block Situation

O faces X's imminent win threat (X has two in a row):

  Board:              Score Heatmap (O's view):
╔═══╦═══╦═══╗         ╔════════╦════════╦════════╗
║ O ║ 1 ║ X ║         ║   O    ║ -0.617 ║   X    ║
╠═══╬═══╬═══╣         ╠════════╬════════╬════════╣
║ X ║ X ║ 5 ║   =>    ║   X    ║   X    ║ -0.191 ║ <- must block!
╠═══╬═══╬═══╣         ╠════════╬════════╬════════╣
║ O ║ 7 ║ 8 ║         ║   O    ║ -0.494 ║ -0.501 ║
╚═══╩═══╩═══╝         ╚════════╩════════╩════════╝
                            Best: 12 (block row)

Position 12 scores best (-0.191) because it blocks X's winning move. Other positions score much worse (-0.5 to -0.6). The ODE simulation naturally discovers blocking as the dominant strategy—no explicit "if opponent has two in a row, block" logic required.

How It Works

The magic is in the pattern collector transitions. When X occupies two squares of a winning pattern, the ODE flow toward win_x increases. Blocking that pattern cuts off the flow.

Score = win_x_final − win_o_final

This formula encodes minimax-like behavior: maximize your win potential while minimizing your opponent's. The continuous dynamics approximate discrete game tree search.

Data generated with go-pflow examples/tictactoe

Conclusion

This tic-tac-toe model demonstrates how a simple game becomes deeply expressive when framed through Petri-net structure. By building the system in layers—grid and turn alternation, history tracking, pattern collectors, and win detection—we create a model capable of representing not just what happened, but why.

Every rule arises locally from token flow. There's no central controller, no global conditional logic—just emergent behavior from places, transitions, and structured history. The ODE simulation reveals strategic value without any game-specific heuristics.

For more on this approach, see Declarative Differential Models.