game tic-tac-toe

Tic-Tac-Toe Petri-Net Model

Overview

This post walks through two complementary Petri-net formulations of tic-tac-toe, progressing from a minimal turn-based grid to a memory-augmented model capable of reasoning about win conditions. These examples demonstrate how Petri nets naturally encode available actions, enforce alternation, and accumulate structured history for higher-level reasoning.

The final sections outline how these models can be converted into ODE systems via go-pflow and how to apply parameter sweeps to explore strategy, reachability, and path likelihoods.

Stage 1 — Grid & Turns

In the first stage, we represent the tic-tac-toe grid as nine places P00P22, each initialized with a token indicating an empty cell.

For every square we define:

A dedicated Next place enforces turn-taking:

This structure cleanly separates:

  1. Geometry
  2. Legal actions
  3. Alternating turn flow

All purely through token movement and local composability—no global logic required.

Grid and Turns

Model Diagram (Stage 1)

pflow

Stage 2 — Adding History

The second version keeps the same grid & turn logic but introduces a memory layer.

Whenever a move fires (e.g. X00):

Similarly, all O moves populate _Oij history places.

This upgrades the model from state of the board to sequence of moves. Now we can reason about why the board reached its state—not just its current configuration.

History Layer

Model Diagram (Stage 2)

pflow

Pattern Collectors & Win Detection

Once we have history places, we can build pattern collectors—transitions that listen for combinations like:

A pattern collector consumes history tokens from the three required squares, producing a structural token such as:

Finally, we add win detection transitions:

This creates a fully compositional win-detection system without any global conditional logic.

Board Positions

The board uses position IDs 0022 (row, column). There are exactly 8 winning patterns: 3 rows, 3 columns, and 2 diagonals.

Pattern Counts

Each position participates in a different number of winning patterns. The center (11) participates in 4 patterns—both diagonals plus its row and column. Corners participate in 3 patterns (one diagonal plus row and column). Edges only participate in 2 (row and column).

Win Detection Flow

Pattern collectors consume 3 history tokens from a complete line and produce a single win token. Higher pattern participation means more paths to victory, which directly translates to higher ODE scores.

pflow

The Petri net above shows the full model with pattern collectors and game-halting win detection. When a win is detected, the transition consumes the current turn token without returning it—this prevents further play after the game ends.

Why Game-Halting Matters

Without game-halting, the ODE simulation continues exploring the full state space even after one player has won. This distorts strategic values because we're averaging over impossible continuations—games that would never happen in practice.

The key insight: win transitions must consume turn tokens. When X wins:

This makes win states into absorbing states in the ODE dynamics—once reached, no further flow is possible. The simulation now correctly weights paths based on reachable game outcomes rather than mathematical completeness.

ODE-Guided Move Selection

Using go-pflow, we can convert this Petri net into an ODE system and simulate forward to predict outcomes. The key insight: win place token accumulation predicts game outcomes.

For each possible move, we:

  1. Create a hypothetical state after making that move
  2. Run ODE simulation (mass-action kinetics, t=0 to t=2.0)
  3. Measure final values of win_x and win_o places
  4. Score = my_win − opponent_win

Example 1: Empty Board

X evaluates all 9 positions. The ODE reveals strategic value through model topology alone:

Empty Board Heatmap

The center position scores highest (1.27) because it participates in 4 winning patterns (2 diagonals + row + column). Corners score next (0.95, 3 patterns each). Edges score lowest (0.63, 2 patterns each). The dotted box indicates the recommended move.

No game heuristics are coded. Strategic value emerges purely from the Petri net's structure.

Example 2: Responding to Center

After X takes center, O evaluates defensive options:

X at Center Heatmap

All scores are negative (X has the advantage), but corners (-1.04) minimize X's lead better than edges (-1.38). The ODE naturally discovers the defensive principle: corners against center.

Example 3: Finding the Winning Line

X has center, O has corner. The ODE identifies the diagonal threat:

Diagonal Threat Heatmap

Positions (0,2) and (2,0) score highest (1.39) because they create two-way threats: each completes one diagonal while opening another winning path. The model's pattern collectors naturally amplify these fork positions.

Example 4: Must-Block Situation

O faces X's imminent win threat (X has two in a row):

Must Block Heatmap

Position (1,2) scores best (0.80) because it blocks X's winning move. Other positions score lower (0.74–0.77). The ODE simulation naturally discovers blocking as the dominant strategy—no explicit "if opponent has two in a row, block" logic required.

Draw Detection

A critical enhancement: the model tracks move tokens to detect draws. Each play transition deposits a token into a move_tokens counter. A draw transition fires when:

The draw transition awards a point to win_o, making "not losing" valuable. This has a profound effect on the ODE dynamics:

Without draw detection: Games that end in draws "leak" out of the system without affecting win counts. The ODE only sees win paths, causing it to favor positions with more winning lines (corners) even when blocking is essential.

With draw detection: All outcomes flow to either win_x or win_o. Blocking a threat now has measurable value because it preserves the possibility of a draw—and draws count for O.

This is why scores are now positive for O (0.74–0.80 instead of negative). O's expected outcome includes draws, making blocking the clearly dominant move.

How It Works

The magic is in the pattern collector transitions. When X occupies two squares of a winning pattern, the ODE flow toward win_x increases. Blocking that pattern cuts off the flow.

Score = my_win_final − opponent_win_final

From each player's perspective, higher is better. The formula encodes minimax-like behavior: maximize your win potential while minimizing your opponent's. With draw detection included, the continuous dynamics correctly approximate discrete game tree search.

Note: Examples 1–3 show scores from X's perspective (before draw detection was added), while Example 4 shows O's perspective with the full model. The relative rankings within each example demonstrate how the ODE naturally discovers strategic principles.

Data generated with go-pflow/examples/tictactoe

Conclusion

This tic-tac-toe model demonstrates how a simple game becomes deeply expressive when framed through Petri-net structure. By building the system in layers—grid and turn alternation, history tracking, pattern collectors, and win detection—we create a model capable of representing not just what happened, but why.

Every rule arises locally from token flow. There's no central controller, no global conditional logic—just emergent behavior from places, transitions, and structured history. The ODE simulation reveals strategic value without any game-specific heuristics.

For more on this approach, see Declarative Differential Models.

×

Follow on Mastodon