Running reinforcement learning experiments directly on a development machine or shared server creates environment conflicts, resource contention, and reproducibility problems. OpenSandbox solves this by provisioning a clean container for each training run, installing RL dependencies from aDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/opensandbox-group/OpenSandbox/llms.txt
Use this file to discover all available pages before exploring further.
requirements.txt at runtime, executing the training script, and making the model checkpoint and JSON summary available through the sandbox file API when training completes.
Prerequisites
Environment Variables
| Variable | Default | Description |
|---|---|---|
SANDBOX_DOMAIN | localhost:8080 | Sandbox service address |
SANDBOX_API_KEY | (optional) | API key if your server requires authentication |
SANDBOX_IMAGE | sandbox-registry…/code-interpreter:v1.1.0 | Docker image to use |
RL_TIMESTEPS | 5000 | Number of training timesteps to run |
RL Dependencies
The training script installs these packages inside the sandbox at runtime:Full Example
The script writes arequirements.txt and the training script (train.py) into the sandbox, installs dependencies, runs training, and reads training_summary.json back to the host. The training script itself is generated as an inline string so no external files need to be present on the host.
How Checkpoints Are Saved and Retrieved
Training
stable_baselines3.DQN.learn() trains a policy for RL_TIMESTEPS steps on the CartPole-v1 environment. TensorBoard event files are written to the runs/ directory inside the sandbox.Checkpoint save
After training,
model.save("checkpoints/cartpole_dqn") writes the model weights to checkpoints/cartpole_dqn.zip inside the sandbox working directory.Evaluation and summary
evaluate_policy() runs 5 evaluation episodes and records mean_reward and std_reward. The summary (including the checkpoint path) is written to training_summary.json.TensorBoard
The training script logs toruns/ inside the sandbox. To inspect training metrics, open a shell in the sandbox and start TensorBoard:
sandbox.get_endpoint(6006).