Reinforcement learning (RL) frames intelligence as an agent interacting with an environment: at each step the agent observes a state, selects an action, and receives a reward signal. Over many episodes the agent learns a policy that maximises cumulative reward—without any labelled training data. This repository implements six RL projects spanning classic arcade games, procedurally generated mazes, and open-ended task automation, giving a practical progression from simple Q-learning grids to pixel-based deep Q-networks running inside real game emulators.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/dronabopche/100-ML-AI-Project/llms.txt
Use this file to discover all available pages before exploring further.
Project 80 – Flappy Bird Agent
Project 80 – Flappy Bird Agent
Objective: Train an agent to play Flappy Bird autonomously by learning when to flap to navigate through pipe gaps, maximising the distance traveled.Algorithm: Deep Q-Network (DQN). The agent observes a compact state vector (bird y-position, vertical velocity, distance to next pipe, gap position) and outputs a binary action: flap or do nothing.Environment: Custom Flappy Bird simulation (e.g.,
pygame-based or flappy-bird-gym).Framework: PyTorch or TensorFlow with a replay buffer and target network for stable Q-value updates.Key Technique: Experience replay + epsilon-greedy exploration annealing.How to Run:Project 81 – Mario Playing RL Agent
Project 81 – Mario Playing RL Agent
Objective: Train an agent to complete levels of Super Mario Bros by moving right, jumping over enemies, and collecting rewards.Algorithm: Proximal Policy Optimization (PPO) or DQN operating on raw pixel frames pre-processed into grayscale stacks.Environment:
gym-super-mario-bros wrapping the NES emulator via nes-py. Observation space is an 84×84×4 stacked grayscale frame tensor.Framework: stable-baselines3 (PPO) or custom PyTorch DQN with convolutional feature extractor.Key Technique: Frame stacking (4 consecutive frames) to encode motion; reward shaping based on x-position delta and time penalty.How to Run:Project 83 – Pong with Double DQN
Project 83 – Pong with Double DQN
Objective: Train an agent to defeat the built-in opponent in Atari Pong using a Double DQN to reduce Q-value overestimation.Algorithm: Double DQN (DDQN). Unlike standard DQN, action selection and Q-value evaluation use separate networks (online and target), decoupling these two correlated operations and improving convergence stability.Environment:
ALE/Pong-v5 via gymnasium[atari]. Observations are 210×160×3 RGB frames, pre-processed to 84×84 grayscale stacks of 4.Framework: PyTorch. Replay buffer stores (state, action, reward, next_state, done) tuples; target network weights are synced every N steps.Key Technique: Double Q-learning update rule, prioritised or uniform experience replay.How to Run:Project 84 – Breakout with DQN
Project 84 – Breakout with DQN
Objective: Train an agent to play Atari Breakout, learning to bounce the ball to break bricks and maximise the score across multiple lives.Algorithm: DQN with convolutional neural network (CNN) as the Q-function approximator—the canonical architecture from the DeepMind 2015 Nature paper.Environment:
ALE/Breakout-v5 via gymnasium[atari]. Four-frame grayscale stacks at 84×84 resolution.Framework: PyTorch. Replay memory of 100 k–1 M transitions; epsilon decays from 1.0 to 0.01 over the first million steps.Key Technique: CNN feature extraction (3 conv layers + 2 FC layers), frame skipping (action repeated every 4 frames), reward clipping to ±1.How to Run:Project 85 – Maze Solver RL
Project 85 – Maze Solver RL
Objective: Train an agent to navigate from a start cell to a goal cell in a grid maze using the shortest possible path, without being given the maze layout in advance.Algorithm: Tabular Q-learning for small discrete mazes; DQN for larger or procedurally generated mazes where the state space is too large for a Q-table.Environment: Custom grid-world environment. States are (row, col) coordinates; actions are . Reward: +10 on reaching the goal, −1 per step, −5 for hitting a wall.Framework: NumPy (tabular) or PyTorch (DQN variant).Key Technique: Epsilon-greedy exploration; for DQN variant, the state is encoded as a flattened one-hot grid or a 2-D occupancy map passed through a small CNN.How to Run:
Project 86 – AI Personal Agent
Project 86 – AI Personal Agent
Objective: Build an autonomous agent that can break down a high-level user goal into sub-tasks, call tools (web search, file I/O, code execution), and iterate until the goal is complete.Algorithm: LLM-based policy (e.g., GPT-4 or open-source equivalent) wrapped in a ReAct (Reasoning + Acting) loop. The agent alternates between a Thought step (chain-of-thought reasoning), an Action step (tool call), and an Observation step (tool result) until it outputs a final answer.Environment: Open-ended task space defined by the user’s prompt. Tools available to the agent may include web search, Python REPL, file reader, and API callers.Framework: LangChain or a custom agent loop; tool results are appended to the context window at each step.Key Technique: ReAct prompting, tool-use via function calling, memory management to stay within context limits.How to Run:
DQN training loop
The following snippet shows a standard DQN training loop—the core pattern shared by Projects 80, 83, and 84. It covers environment stepping, replay buffer sampling, the Bellman update, and target network synchronisation.Project comparison
| Project | Algorithm | Environment | Framework | Key Technique |
|---|---|---|---|---|
| 80 – Flappy Bird | DQN | Custom pygame / flappy-bird-gym | PyTorch | Replay buffer, target network |
| 81 – Mario RL Agent | PPO / DQN | gym-super-mario-bros | stable-baselines3 / PyTorch | Frame stacking, reward shaping |
| 83 – Pong DDQN | Double DQN | ALE/Pong-v5 (Gymnasium) | PyTorch | Decoupled action selection & evaluation |
| 84 – Breakout DQN | DQN (CNN) | ALE/Breakout-v5 (Gymnasium) | PyTorch | Conv feature extractor, frame skip |
| 85 – Maze Solver | Q-learning / DQN | Custom grid-world | NumPy / PyTorch | Tabular or deep Q-table, step penalty |
| 86 – AI Personal Agent | ReAct (LLM policy) | Open-ended task space | LangChain | Tool-use, chain-of-thought reasoning |
The game-playing projects (80, 81, 83, 84) depend on specific environment packages. Install them before running:Atari environments additionally require the Atari ROM files. Follow the
ale-py documentation to import ROMs legally using ale-import-roms.