Documentation Index
Fetch the complete documentation index at: https://mintlify.com/marshalharman/QLearning_and_MCTS-Reinforcement_Learning/llms.txt
Use this file to discover all available pages before exploring further.
Random_Player is a baseline agent that picks uniformly at random from all valid (non-full) columns on every turn. The MCTS agent uses two Random_Player instances internally during its simulation (playout) phase — one for each player — to play games out to completion from an expanded node. Because the rollout policy is intentionally simple, all strategic strength in MCTS comes from the tree search itself rather than from the simulations.
Class: Random_Player
Constructor Parameters
Player token this instance represents —
1 or 2. Used when placing pieces
and when checking whether a winning state belongs to this player.Number of rows in the board. Pieces fall to the lowest empty row in each
column (gravity), so
r is needed to find the correct cell.Number of columns in the board. Determines the action space size and bounds
used in win-detection.
Methods
set_state(state)
Loads the current board into the agent. Must be called before take_action().
Parameters
A 2D list of shape
[r][c]. Cells hold 0 (empty), 1 (player 1’s piece),
or 2 (player 2’s piece). The agent modifies this list in-place when a piece
is placed.None
take_action() -> tuple
Selects a column uniformly at random from all non-full columns, drops the agent’s piece via gravity to the lowest empty row, then checks whether the game has ended.
Returns — 4-tuple:
The updated board after the piece has been placed. Modified in-place from the
list provided to
set_state().True if the game is now over (win or draw); False if play continues."win" if this agent just completed four in a row, "draw" if all columns
are full with no winner, or ".." if the game is still in progress.Column index (0-based) that was chosen.
random_action() -> np.int64
Samples a single column index uniformly from all columns whose top cell (state[0][i]) is empty. Full columns receive a probability weight of 0 and are excluded.
Returns: np.int64 — the chosen column index.
random_action() is also used directly by MCTS.take_action() when
play_outs=0. The logic is identical in both classes.is_terminal_state(next_state, action) -> tuple[bool, str]
Checks whether the board has reached a terminal condition after the most recent move.
Parameters
Board state to evaluate.
Column index of the piece just placed. Used to anchor the win-check scan.
(True, "win")— the piece just placed completed four in a row.(True, "draw")— all columns are full with no winner.(False, "..")— the game is still in progress.
is_winning_state(next_state, action) -> bool
Scans outward from the cell where the last piece landed in four directions — diagonal [1,1], anti-diagonal [1,-1], horizontal [0,1], and vertical [1,0] — counting consecutive tokens belonging to self.player.
Parameters
Board state to scan.
Column index of the last piece. The row is located automatically by scanning
downward from the top until a non-empty cell is found.
bool — True if four consecutive tokens of self.player exist through the placed cell. Internally the check triggers when count == 3, meaning 3 additional aligned pieces beyond the placed piece (4 total in a line).
Usage Example
Random_Player instances:
Random_Player is not intended for standalone use as a competitive agent. It
serves as the rollout policy inside MCTS simulations and can be used to
benchmark the quality of MCTS and Q-Learning agents — if either agent cannot
beat a random opponent reliably, something is wrong with the configuration or
training procedure.