Connect 4 Board Environment and State Representation

Both the MCTS and Q-Learning agents share the same Connect 4 environment model: a NumPy 2D array where pieces fall under gravity, valid moves are checked by inspecting the top row, and terminal states are detected by scanning four-in-a-row patterns around the last-played piece.

Board Layout

The board is a NumPy 2D array of shape (rows, columns) initialized with np.zeros and immediately cast to int. The cell encoding is straightforward:

Value	Meaning
`0`	Empty cell
`1`	Player 1’s piece
`2`	Player 2’s piece

Board dimensions vary by game mode:

Mode	Rows	Columns
MCTS vs MCTS	6	5
Q-Learning training	4	5
MCTS vs Q-Learning (testing)	3	5

import numpy as np

# MCTS vs MCTS — default 6×5
game = np.zeros((6, 5))
game = game.astype(int)

# Print the board
print('\n'.join(' '.join(str(x) for x in row) for row in game))

Gravity Mechanic

When a player drops a piece into column i, the piece falls to the lowest unoccupied row. The code iterates from the bottom row upward until it finds a zero cell:

# From RandomPlayer.py — take_action()
action = self.random_action()
for x in range(self.r):
    if self.state[self.r-1-x][action] == 0:
        self.state[self.r-1-x][action] = self.player
        break

The same pattern appears in MCTS.py during expansion(), where each candidate child state is built by dropping the current player’s piece into an open column:

# From MCTS.py — expansion()
for x in range(self.r):
    if(new[self.r-1-x][i] == 0):
        if( depth%2 == 0):
            new[self.r-1-x][i] = self.player
        else:
            new[self.r-1-x][i] = self.player%2 + 1
        break

Valid Actions

A column index i is a valid action if and only if the top cell of that column is empty:

# Valid if state[0][i] == 0
for i in range(self.c):
    if curr_state[0][i] == 0:
        # column i is a legal move

This is checked across all four classes (MCTS, Q_Learning, Random_Player, and the main game loop) in the same way.

Terminal State Detection

The is_terminal_state method returns a (bool, str) tuple:

Return value	Meaning
`(True, "win")`	The last move was a winning move
`(True, "draw")`	All top-row cells are occupied; no winner
`(False, "..")`	Game is still in progress

# From MCTS.py — is_terminal_state()
def is_terminal_state(self, next_state, action):
    if self.is_winning_state(next_state, action):
        return True, "win"

    for y in range(self.c):
        if(next_state[0][y] == 0):
            return False, ".."

    return True, "draw"

Draw detection scans every column’s top cell (next_state[0][y]). If all are non-zero, the board is full and the game is a draw.

Win Detection Algorithm

is_winning_state finds the row of the last-placed piece by scanning downward from row 0 in the played column, then checks all four directional axes using direction vectors.

# From MCTS.py — is_winning_state()
def is_winning_state(self, next_state, action):
    y = action
    x = 0

    for i in range(self.r):
        if next_state[i][y] != 0:
            break
        x += 1

    directions = [ [1,1], [1,-1], [0,1], [1,0] ]
    for d in directions:
        for i in range(4):

            count = 0
            for j in range(i):
                x_dash = x + (j+1)*d[0]
                y_dash = y + (j+1)*d[1]

                if( self.out_of_bounds(x_dash,y_dash) or next_state[x_dash][y_dash] != self.player):
                    break
                count+=1

            for j in range(3-i):
                x_dash = x - (j+1)*d[0]
                y_dash = y - (j+1)*d[1]

                if( self.out_of_bounds(x_dash,y_dash) or next_state[x_dash][y_dash] != self.player):
                    break
                count+=1

            if(count == 3):
                return True

    return False

The four direction vectors cover:

Vector	Axis
`[1, 1]`	Diagonal (↘)
`[1, -1]`	Anti-diagonal (↙)
`[0, 1]`	Horizontal (→)
`[1, 0]`	Vertical (↓)

For each direction and each possible split i (0–3), the algorithm counts how many consecutive friendly neighbor pieces extend in both the positive and negative direction from the played cell (x, y). The piece at (x, y) itself is not added to count, so the win condition count == 3 means three additional consecutive same-color pieces were found — four pieces in a row in total when the placed piece is included. The out_of_bounds guard prevents index errors at board edges:

def out_of_bounds(self, x, y):
    return not(x >= 0 and x < self.r and y >= 0 and y < self.c)

The win check receives only the action (column index) of the last move — it does not scan the entire board. The row is derived at call time from the played column. This means is_winning_state must be called immediately after each move with the correct action argument.

Get Started

Concepts

Agents

Training & Evaluation

Connect 4 Board Environment and State Representation

Board Layout

Gravity Mechanic

Valid Actions

Terminal State Detection

Win Detection Algorithm

Build docs developers (and LLMs) love

Get Started

Concepts

Agents

Training & Evaluation

Documentation Index

​Board Layout

​Gravity Mechanic

​Valid Actions

​Terminal State Detection

​Win Detection Algorithm

Build docs developers (and LLMs) love

Board Layout

Gravity Mechanic

Valid Actions

Terminal State Detection

Win Detection Algorithm