Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/lansinuote/Simple_Reinforcement_Learning/llms.txt

Use this file to discover all available pages before exploring further.

OpenAI Gym provides a standardized collection of environments for reinforcement learning research. Each environment exposes a common API — reset(), step(), and render() — so that algorithms can be developed and tested without modification across a wide range of tasks.

What is Gym?

Gym environments simulate a world in which an agent can act. The agent receives observations of the world’s state, takes actions, and collects scalar rewards. Your goal as a researcher is to find a policy (a mapping from states to actions) that maximizes cumulative reward. Common environments include:
EnvironmentTask
CartPole-v0Balance a pole on a moving cart
LunarLander-v2Land a spacecraft on a pad
FrozenLake-v1Navigate a grid without falling into holes

Installation

pip install gym==0.26
If you encounter errors, install any additional packages that the error message mentions (e.g., box2d-py for LunarLander).

Creating and Rendering Environments

1
Create the environment
2
Use gym.make() with render_mode='rgb_array' so you can capture frames programmatically.
3
import gym
from matplotlib import pyplot as plt
%matplotlib inline

# 创建一个游戏环境 — Create a game environment
env = gym.make('CartPole-v0', render_mode='rgb_array')

# 初始化游戏 — Initialize the game
env.reset()

# 显示游戏 — Display the game
plt.imshow(env.render())
plt.show()

# 关闭游戏 — Close the game
env.close()
4
Play random actions in LunarLander
5
The step() method returns (state, reward, terminated, truncated, info). Combine terminated and truncated into a single over flag as shown below.
6
from IPython import display
import time

# 创建月球着陆 — Create the lunar lander
env = gym.make('LunarLander-v2', render_mode='rgb_array')

# 初始化游戏 — Initialize the game
state, info = env.reset()

# 随机玩N个动作 — Play N random actions
for i in range(300):
    action = env.action_space.sample()
    state, reward, terminated, truncated, info = env.step(action)
    over = terminated or truncated

    if i % 5 == 0:  # 跳帧 — Skip frames
        # 打印动画 — Print animation
        display.clear_output(wait=True)
        plt.imshow(env.render())
        plt.show()

    # 游戏结束了就重置 — Reset when game ends
    if over:
        state, info = env.reset()

# 关闭游戏 — Close the game
env.close()

Inspecting the Action and Observation Spaces

After creating an environment you can query its spaces to understand how many actions are available and the range of valid observations.
# 游戏的动作空间 — Action space
env.action_space
# Discrete(4)
Discrete(4) means the agent may choose one of 4 discrete integer actions: 0, 1, 2, or 3. Continuous action spaces use Box instead.

The MyWrapper Pattern

Later notebooks (e.g., the DQN series) use a thin wrapper around the environment that standardises the step() return signature to (state, reward, done, info) — collapsing terminated and truncated into a single Boolean and adding an optional step limit.
import gym


# 定义环境 — Define the environment
class MyWrapper(gym.Wrapper):

    def __init__(self):
        env = gym.make('CartPole-v1', render_mode='rgb_array')
        super().__init__(env)
        self.env = env
        self.step_n = 0

    def reset(self):
        state, _ = self.env.reset()
        self.step_n = 0
        return state

    def step(self, action):
        state, reward, terminated, truncated, info = self.env.step(action)
        done = terminated or truncated
        self.step_n += 1
        if self.step_n >= 200:
            done = True
        return state, reward, done, info


env = MyWrapper()
env.reset()
Wrapping the environment hides the terminated/truncated distinction introduced in Gym 0.26 and enforces a maximum episode length. This makes your training loop cleaner and more reproducible.

Playing Interactively

You can also play Gym games yourself using keyboard mappings — though this requires a graphical display:
import gym
import pygame
from gym.utils.play import play

# 定义按键映射 — Define key mapping
mapping = {(pygame.K_LEFT,): 0, (pygame.K_RIGHT,): 1}

# 直接玩游戏 — Play the game directly (requires a graphical interface)
# play(gym.make('CartPole-v0'), keys_to_action=mapping)

Summary

ConceptAPI
Create environmentgym.make('EnvName-v0', render_mode='rgb_array')
Reset episodeenv.reset() → initial state
Take actionenv.step(action)(state, reward, terminated, truncated, info)
Random actionenv.action_space.sample()
Action countenv.action_space (Discrete)
State boundsenv.observation_space.high / .low
Cleanupenv.close()

Build docs developers (and LLMs) love