Random Player: Baseline Agent for MCTS Simulations

Random_Player is a baseline agent that picks uniformly at random from all valid (non-full) columns on every turn. The MCTS agent uses two Random_Player instances internally during its simulation (playout) phase — one for each player — to play games out to completion from an expanded node. Because the rollout policy is intentionally simple, all strategic strength in MCTS comes from the tree search itself rather than from the simulations.

Class: `Random_Player`

class Random_Player:
    def __init__(self, player, r, c):
        ...

Constructor Parameters

player

int

required

Player token this instance represents — 1 or 2. Used when placing pieces and when checking whether a winning state belongs to this player.

int

required

Number of rows in the board. Pieces fall to the lowest empty row in each column (gravity), so r is needed to find the correct cell.

int

required

Number of columns in the board. Determines the action space size and bounds used in win-detection.

Methods

`set_state(state)`

Loads the current board into the agent. Must be called before take_action(). Parameters

state

list[list[int]]

A 2D list of shape [r][c]. Cells hold 0 (empty), 1 (player 1’s piece), or 2 (player 2’s piece). The agent modifies this list in-place when a piece is placed.

Returns: None

`take_action() -> tuple`

Selects a column uniformly at random from all non-full columns, drops the agent’s piece via gravity to the lowest empty row, then checks whether the game has ended. Returns — 4-tuple:

state

list[list[int]]

The updated board after the piece has been placed. Modified in-place from the list provided to set_state().

end

bool

True if the game is now over (win or draw); False if play continues.

result

str

"win" if this agent just completed four in a row, "draw" if all columns are full with no winner, or ".." if the game is still in progress.

action

np.int64

Column index (0-based) that was chosen.

`random_action() -> np.int64`

Samples a single column index uniformly from all columns whose top cell (state[0][i]) is empty. Full columns receive a probability weight of 0 and are excluded. Returns: np.int64 — the chosen column index.

# Internally uses random.choices with a uniform distribution
# over valid columns. Full columns are given weight 0.
action = random.choices(actions, probabilities, k=1)
return np.int64(action[0])

random_action() is also used directly by MCTS.take_action() when play_outs=0. The logic is identical in both classes.

`is_terminal_state(next_state, action) -> tuple[bool, str]`

Checks whether the board has reached a terminal condition after the most recent move. Parameters

next_state

list[list[int]]

Board state to evaluate.

action

int

Column index of the piece just placed. Used to anchor the win-check scan.

Returns:

(True, "win") — the piece just placed completed four in a row.
(True, "draw") — all columns are full with no winner.
(False, "..") — the game is still in progress.

The draw-detection loop in the source hardcodes for y in range(5) instead of for y in range(self.c). This means the draw check always scans exactly 5 columns regardless of the value passed as c to the constructor. On boards with fewer or more than 5 columns the draw detection will be incorrect. MCTS and Q_Learning use their own is_terminal_state implementations which correctly use self.c, so this limitation only affects Random_Player when used standalone on non-5-column boards.

`is_winning_state(next_state, action) -> bool`

Scans outward from the cell where the last piece landed in four directions — diagonal [1,1], anti-diagonal [1,-1], horizontal [0,1], and vertical [1,0] — counting consecutive tokens belonging to self.player. Parameters

next_state

list[list[int]]

Board state to scan.

action

int

Column index of the last piece. The row is located automatically by scanning downward from the top until a non-empty cell is found.

Returns: bool — True if four consecutive tokens of self.player exist through the placed cell. Internally the check triggers when count == 3, meaning 3 additional aligned pieces beyond the placed piece (4 total in a line).

Usage Example

import numpy as np
from RandomPlayer import Random_Player

# Initialise a blank 6×5 board
game = np.zeros((6, 5), dtype=int)

# Create a random player for player 1
player = Random_Player(player=1, r=6, c=5)
player.set_state(game)

game, end, result, action = player.take_action()
print(f"Random player chose column {action}, game ended: {end}")
# e.g. → Random player chose column 3, game ended: False

To simulate a full random game between two Random_Player instances:

import numpy as np
from RandomPlayer import Random_Player

game = np.zeros((6, 5), dtype=int)
p1 = Random_Player(player=1, r=6, c=5)
p2 = Random_Player(player=2, r=6, c=5)

turn = 1
while True:
    if turn == 1:
        p1.set_state(game)
        game, end, result, action = p1.take_action()
    else:
        p2.set_state(game)
        game, end, result, action = p2.take_action()

    if end:
        winner = f"Player {turn}" if result == "win" else "Draw"
        print(f"Game over — {winner}")
        break

    turn = 2 if turn == 1 else 1

Random_Player is not intended for standalone use as a competitive agent. It serves as the rollout policy inside MCTS simulations and can be used to benchmark the quality of MCTS and Q-Learning agents — if either agent cannot beat a random opponent reliably, something is wrong with the configuration or training procedure.

Get Started

Concepts

Agents

Training & Evaluation

Class: `Random_Player`

Constructor Parameters

Methods

`set_state(state)`

`take_action() -> tuple`

`random_action() -> np.int64`

`is_terminal_state(next_state, action) -> tuple[bool, str]`

`is_winning_state(next_state, action) -> bool`

Usage Example

Build docs developers (and LLMs) love

Get Started

Concepts

Agents

Training & Evaluation

Documentation Index

​Class: Random_Player

​Constructor Parameters

​Methods

​set_state(state)

​take_action() -> tuple

​random_action() -> np.int64

​is_terminal_state(next_state, action) -> tuple[bool, str]

​is_winning_state(next_state, action) -> bool

​Usage Example

Build docs developers (and LLMs) love

Class: `Random_Player`

Constructor Parameters

Methods

`set_state(state)`

`take_action() -> tuple`

`random_action() -> np.int64`

`is_terminal_state(next_state, action) -> tuple[bool, str]`

`is_winning_state(next_state, action) -> bool`

Usage Example