Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/marshalharman/QLearning_and_MCTS-Reinforcement_Learning/llms.txt

Use this file to discover all available pages before exploring further.

Random_Player is a baseline agent that picks uniformly at random from all valid (non-full) columns on every turn. The MCTS agent uses two Random_Player instances internally during its simulation (playout) phase — one for each player — to play games out to completion from an expanded node. Because the rollout policy is intentionally simple, all strategic strength in MCTS comes from the tree search itself rather than from the simulations.

Class: Random_Player

class Random_Player:
    def __init__(self, player, r, c):
        ...

Constructor Parameters

player
int
required
Player token this instance represents — 1 or 2. Used when placing pieces and when checking whether a winning state belongs to this player.
r
int
required
Number of rows in the board. Pieces fall to the lowest empty row in each column (gravity), so r is needed to find the correct cell.
c
int
required
Number of columns in the board. Determines the action space size and bounds used in win-detection.

Methods

set_state(state)

Loads the current board into the agent. Must be called before take_action(). Parameters
state
list[list[int]]
A 2D list of shape [r][c]. Cells hold 0 (empty), 1 (player 1’s piece), or 2 (player 2’s piece). The agent modifies this list in-place when a piece is placed.
Returns: None

take_action() -> tuple

Selects a column uniformly at random from all non-full columns, drops the agent’s piece via gravity to the lowest empty row, then checks whether the game has ended. Returns — 4-tuple:
state
list[list[int]]
The updated board after the piece has been placed. Modified in-place from the list provided to set_state().
end
bool
True if the game is now over (win or draw); False if play continues.
result
str
"win" if this agent just completed four in a row, "draw" if all columns are full with no winner, or ".." if the game is still in progress.
action
np.int64
Column index (0-based) that was chosen.

random_action() -> np.int64

Samples a single column index uniformly from all columns whose top cell (state[0][i]) is empty. Full columns receive a probability weight of 0 and are excluded. Returns: np.int64 — the chosen column index.
# Internally uses random.choices with a uniform distribution
# over valid columns. Full columns are given weight 0.
action = random.choices(actions, probabilities, k=1)
return np.int64(action[0])
random_action() is also used directly by MCTS.take_action() when play_outs=0. The logic is identical in both classes.

is_terminal_state(next_state, action) -> tuple[bool, str]

Checks whether the board has reached a terminal condition after the most recent move. Parameters
next_state
list[list[int]]
Board state to evaluate.
action
int
Column index of the piece just placed. Used to anchor the win-check scan.
Returns:
  • (True, "win") — the piece just placed completed four in a row.
  • (True, "draw") — all columns are full with no winner.
  • (False, "..") — the game is still in progress.
The draw-detection loop in the source hardcodes for y in range(5) instead of for y in range(self.c). This means the draw check always scans exactly 5 columns regardless of the value passed as c to the constructor. On boards with fewer or more than 5 columns the draw detection will be incorrect. MCTS and Q_Learning use their own is_terminal_state implementations which correctly use self.c, so this limitation only affects Random_Player when used standalone on non-5-column boards.

is_winning_state(next_state, action) -> bool

Scans outward from the cell where the last piece landed in four directions — diagonal [1,1], anti-diagonal [1,-1], horizontal [0,1], and vertical [1,0] — counting consecutive tokens belonging to self.player. Parameters
next_state
list[list[int]]
Board state to scan.
action
int
Column index of the last piece. The row is located automatically by scanning downward from the top until a non-empty cell is found.
Returns: boolTrue if four consecutive tokens of self.player exist through the placed cell. Internally the check triggers when count == 3, meaning 3 additional aligned pieces beyond the placed piece (4 total in a line).

Usage Example

import numpy as np
from RandomPlayer import Random_Player

# Initialise a blank 6×5 board
game = np.zeros((6, 5), dtype=int)

# Create a random player for player 1
player = Random_Player(player=1, r=6, c=5)
player.set_state(game)

game, end, result, action = player.take_action()
print(f"Random player chose column {action}, game ended: {end}")
# e.g. → Random player chose column 3, game ended: False
To simulate a full random game between two Random_Player instances:
import numpy as np
from RandomPlayer import Random_Player

game = np.zeros((6, 5), dtype=int)
p1 = Random_Player(player=1, r=6, c=5)
p2 = Random_Player(player=2, r=6, c=5)

turn = 1
while True:
    if turn == 1:
        p1.set_state(game)
        game, end, result, action = p1.take_action()
    else:
        p2.set_state(game)
        game, end, result, action = p2.take_action()

    if end:
        winner = f"Player {turn}" if result == "win" else "Draw"
        print(f"Game over — {winner}")
        break

    turn = 2 if turn == 1 else 1
Random_Player is not intended for standalone use as a competitive agent. It serves as the rollout policy inside MCTS simulations and can be used to benchmark the quality of MCTS and Q-Learning agents — if either agent cannot beat a random opponent reliably, something is wrong with the configuration or training procedure.

Build docs developers (and LLMs) love