Reinforcement learning agents that master Connect 4 through two fundamentally different strategies: Monte Carlo Tree Search (MCTS), which reasons ahead by simulating random game continuations, and Q-Learning, which learns a value function by playing thousands of training games against an MCTS opponent. Both agents operate on a configurable grid, expose a consistent action interface, and can be matched against each other or benchmarked against a random baseline player.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/marshalharman/QLearning_and_MCTS-Reinforcement_Learning/llms.txt
Use this file to discover all available pages before exploring further.
Quickstart
Run your first agent matchup in under five minutes
MCTS Agent
Full API reference for the Monte Carlo Tree Search agent
Q-Learning Agent
Full API reference for the Q-Learning agent
Training Guide
Train the Q-Learning agent against an MCTS opponent
What’s inside
Connect 4 Environment
Board representation, valid moves, and terminal-state detection
MCTS Concepts
Selection, expansion, simulation, and back-propagation explained
Q-Learning Concepts
Bellman updates, epsilon-greedy policy, and mirror symmetry
Evaluation
Run head-to-head matchups and interpret win/draw/loss results
Highlights
- UCB1-guided tree search — MCTS balances exploration vs. exploitation using the Upper Confidence Bound formula.
- Mirror-state symmetry — Q-Learning halves the state space by treating a board and its horizontal reflection as equivalent.
- Pluggable board size — both agents accept arbitrary
rows × colsdimensions at construction time. - Persistent Q-tables — trained value functions are saved as gzip-compressed pickle files and reloaded for evaluation.
- Reward shaping — the Q-Learning agent uses shaped rewards (
+50win,−50loss,−10draw,−1per step) for stable convergence.