Simple Reinforcement Learning is a hands-on notebook series that takes you from the very basics of reinforcement learning — stateless bandit problems — all the way through state-of-the-art deep RL algorithms like PPO, DDPG, and SAC. Every topic is a self-contained Jupyter notebook with clean, minimal Python code built on PyTorch and OpenAI Gym.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/lansinuote/Simple_Reinforcement_Learning/llms.txt
Use this file to discover all available pages before exploring further.
Get Started
Understand what this course covers and how to navigate the notebooks.
Environment Setup
Install Python 3.9, PyTorch 1.12.1, and Gym 0.26.2 to run every notebook.
OpenAI Gym Basics
Learn how to create, reset, step through, and render Gym environments.
Bandit Algorithms
Explore Greedy, UCB, and Thompson Sampling on the multi-armed bandit problem.
What You’ll Learn
This series covers the full spectrum of modern RL, organized into four progressive sections:Foundations
Gym environments, Markov Decision Processes, Monte Carlo methods, Bellman equations, and dynamic programming.
Tabular & Model-Based Methods
Sarsa, N-step Sarsa, Q-Learning, and DynaQ — classic tabular and model-assisted planning algorithms.
Deep RL Algorithms
DQN, Double DQN, Dueling DQN, REINFORCE, Actor-Critic, PPO, DDPG, and SAC using PyTorch neural networks.
Advanced Topics
Imitation Learning, Offline RL, Model Predictive Control, MBPO, Goal-conditioned RL, and Multi-agent systems.
Algorithm Coverage
| Section | Algorithms |
|---|---|
| Stateless Bandits | Greedy, Decaying Greedy, UCB, Thompson Sampling |
| MDP Foundations | Monte Carlo, Bellman Equation |
| Dynamic Programming | Policy Iteration, Value Iteration |
| Temporal Difference | Sarsa, N-step Sarsa, Q-Learning |
| Model-Based | DynaQ, MPC, MBPO |
| Deep Value-Based | DQN, Double DQN, Dueling DQN |
| Policy Gradient | REINFORCE, Actor-Critic, PPO |
| Continuous Action | DDPG, SAC |
| Advanced | Imitation Learning, Offline RL, Goal-conditioned RL, Multi-agent |
Prerequisites
You should be comfortable with Python and have a basic understanding of neural networks. No prior RL experience is required — the course builds all concepts from scratch.
- Python — familiarity with NumPy and basic Python scripting
- PyTorch — basic tensor operations and
nn.Sequentialmodels - Math — high-school probability and linear algebra are sufficient