Simple Reinforcement Learning: Hands-On RL from Scratch

Simple Reinforcement Learning is a hands-on notebook series that takes you from the very basics of reinforcement learning — stateless bandit problems — all the way through state-of-the-art deep RL algorithms like PPO, DDPG, and SAC. Every topic is a self-contained Jupyter notebook with clean, minimal Python code built on PyTorch and OpenAI Gym.

Get Started

Understand what this course covers and how to navigate the notebooks.

Environment Setup

Install Python 3.9, PyTorch 1.12.1, and Gym 0.26.2 to run every notebook.

OpenAI Gym Basics

Learn how to create, reset, step through, and render Gym environments.

Bandit Algorithms

Explore Greedy, UCB, and Thompson Sampling on the multi-armed bandit problem.

What You’ll Learn

This series covers the full spectrum of modern RL, organized into four progressive sections:

Foundations

Gym environments, Markov Decision Processes, Monte Carlo methods, Bellman equations, and dynamic programming.

Tabular & Model-Based Methods

Sarsa, N-step Sarsa, Q-Learning, and DynaQ — classic tabular and model-assisted planning algorithms.

Deep RL Algorithms

DQN, Double DQN, Dueling DQN, REINFORCE, Actor-Critic, PPO, DDPG, and SAC using PyTorch neural networks.

Advanced Topics

Imitation Learning, Offline RL, Model Predictive Control, MBPO, Goal-conditioned RL, and Multi-agent systems.

Algorithm Coverage

Section	Algorithms
Stateless Bandits	Greedy, Decaying Greedy, UCB, Thompson Sampling
MDP Foundations	Monte Carlo, Bellman Equation
Dynamic Programming	Policy Iteration, Value Iteration
Temporal Difference	Sarsa, N-step Sarsa, Q-Learning
Model-Based	DynaQ, MPC, MBPO
Deep Value-Based	DQN, Double DQN, Dueling DQN
Policy Gradient	REINFORCE, Actor-Critic, PPO
Continuous Action	DDPG, SAC
Advanced	Imitation Learning, Offline RL, Goal-conditioned RL, Multi-agent

Prerequisites

You should be comfortable with Python and have a basic understanding of neural networks. No prior RL experience is required — the course builds all concepts from scratch.

Python — familiarity with NumPy and basic Python scripting
PyTorch — basic tensor operations and nn.Sequential models
Math — high-school probability and linear algebra are sufficient

Quick Setup

Install Python 3.9

Use Anaconda or pyenv to create an isolated Python 3.9 environment.

Install dependencies

pip install torch==1.12.1 gym==0.26.2 matplotlib numpy

Clone the repository

git clone https://github.com/lansinuote/Simple_Reinforcement_Learning.git
cd Simple_Reinforcement_Learning

Open a notebook

jupyter notebook

Navigate to any numbered folder and open the first notebook to begin.

Get Started

Foundations

Tabular & Model-Based Methods

Deep RL Algorithms

Advanced Topics

Documentation Index

Get Started

Environment Setup

OpenAI Gym Basics

Bandit Algorithms

​What You’ll Learn

Foundations

Tabular & Model-Based Methods

Deep RL Algorithms

Advanced Topics

​Algorithm Coverage

​Prerequisites

​Quick Setup

Build docs developers (and LLMs) love

What You’ll Learn

Algorithm Coverage

Prerequisites

Quick Setup