Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/marshalharman/QLearning_and_MCTS-Reinforcement_Learning/llms.txt

Use this file to discover all available pages before exploring further.

Reinforcement learning agents that master Connect 4 through two fundamentally different strategies: Monte Carlo Tree Search (MCTS), which reasons ahead by simulating random game continuations, and Q-Learning, which learns a value function by playing thousands of training games against an MCTS opponent. Both agents operate on a configurable grid, expose a consistent action interface, and can be matched against each other or benchmarked against a random baseline player.

Quickstart

Run your first agent matchup in under five minutes

MCTS Agent

Full API reference for the Monte Carlo Tree Search agent

Q-Learning Agent

Full API reference for the Q-Learning agent

Training Guide

Train the Q-Learning agent against an MCTS opponent

What’s inside

Connect 4 Environment

Board representation, valid moves, and terminal-state detection

MCTS Concepts

Selection, expansion, simulation, and back-propagation explained

Q-Learning Concepts

Bellman updates, epsilon-greedy policy, and mirror symmetry

Evaluation

Run head-to-head matchups and interpret win/draw/loss results

Highlights

  • UCB1-guided tree search — MCTS balances exploration vs. exploitation using the Upper Confidence Bound formula.
  • Mirror-state symmetry — Q-Learning halves the state space by treating a board and its horizontal reflection as equivalent.
  • Pluggable board size — both agents accept arbitrary rows × cols dimensions at construction time.
  • Persistent Q-tables — trained value functions are saved as gzip-compressed pickle files and reloaded for evaluation.
  • Reward shaping — the Q-Learning agent uses shaped rewards (+50 win, −50 loss, −10 draw, −1 per step) for stable convergence.

Build docs developers (and LLMs) love