OpenAI Gym provides a standardized collection of environments for reinforcement learning research. Each environment exposes a common API —Documentation Index
Fetch the complete documentation index at: https://mintlify.com/lansinuote/Simple_Reinforcement_Learning/llms.txt
Use this file to discover all available pages before exploring further.
reset(), step(), and render() — so that algorithms can be developed and tested without modification across a wide range of tasks.
What is Gym?
Gym environments simulate a world in which an agent can act. The agent receives observations of the world’s state, takes actions, and collects scalar rewards. Your goal as a researcher is to find a policy (a mapping from states to actions) that maximizes cumulative reward. Common environments include:| Environment | Task |
|---|---|
CartPole-v0 | Balance a pole on a moving cart |
LunarLander-v2 | Land a spacecraft on a pad |
FrozenLake-v1 | Navigate a grid without falling into holes |
Installation
If you encounter errors, install any additional packages that the error message mentions (e.g.,
box2d-py for LunarLander).Creating and Rendering Environments
import gym
from matplotlib import pyplot as plt
%matplotlib inline
# 创建一个游戏环境 — Create a game environment
env = gym.make('CartPole-v0', render_mode='rgb_array')
# 初始化游戏 — Initialize the game
env.reset()
# 显示游戏 — Display the game
plt.imshow(env.render())
plt.show()
# 关闭游戏 — Close the game
env.close()
The
step() method returns (state, reward, terminated, truncated, info). Combine terminated and truncated into a single over flag as shown below.from IPython import display
import time
# 创建月球着陆 — Create the lunar lander
env = gym.make('LunarLander-v2', render_mode='rgb_array')
# 初始化游戏 — Initialize the game
state, info = env.reset()
# 随机玩N个动作 — Play N random actions
for i in range(300):
action = env.action_space.sample()
state, reward, terminated, truncated, info = env.step(action)
over = terminated or truncated
if i % 5 == 0: # 跳帧 — Skip frames
# 打印动画 — Print animation
display.clear_output(wait=True)
plt.imshow(env.render())
plt.show()
# 游戏结束了就重置 — Reset when game ends
if over:
state, info = env.reset()
# 关闭游戏 — Close the game
env.close()
Inspecting the Action and Observation Spaces
After creating an environment you can query its spaces to understand how many actions are available and the range of valid observations.Discrete(4) means the agent may choose one of 4 discrete integer actions: 0, 1, 2, or 3. Continuous action spaces use Box instead.The MyWrapper Pattern
Later notebooks (e.g., the DQN series) use a thin wrapper around the environment that standardises the step() return signature to (state, reward, done, info) — collapsing terminated and truncated into a single Boolean and adding an optional step limit.
Playing Interactively
You can also play Gym games yourself using keyboard mappings — though this requires a graphical display:Summary
| Concept | API |
|---|---|
| Create environment | gym.make('EnvName-v0', render_mode='rgb_array') |
| Reset episode | env.reset() → initial state |
| Take action | env.step(action) → (state, reward, terminated, truncated, info) |
| Random action | env.action_space.sample() |
| Action count | env.action_space (Discrete) |
| State bounds | env.observation_space.high / .low |
| Cleanup | env.close() |