Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/MilesONerd/neurenix/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Neurenix provides implementations of state-of-the-art reinforcement learning algorithms for both discrete and continuous action spaces.

DQN - Deep Q-Network

class DQN:
    def __init__(
        self,
        observation_space: Dict[str, Any],
        action_space: Dict[str, Any],
        hidden_dims: List[int] = [64, 64],
        learning_rate: float = 0.001,
        gamma: float = 0.99,
        epsilon_start: float = 1.0,
        epsilon_end: float = 0.01,
        epsilon_decay: float = 0.995,
        buffer_size: int = 10000,
        batch_size: int = 64,
        update_target_every: int = 100,
        double_q: bool = False,
        dueling: bool = False,
        name: str = "DQN",
    )

Parameters

observation_space
Dict[str, Any]
required
Observation space specification.
action_space
Dict[str, Any]
required
Action space specification.
hidden_dims
List[int]
default:"[64, 64]"
Hidden layer dimensions for the Q-network.
learning_rate
float
default:"0.001"
Learning rate for optimizer.
gamma
float
default:"0.99"
Discount factor for future rewards.
epsilon_start
float
default:"1.0"
Initial exploration rate.
epsilon_end
float
default:"0.01"
Final exploration rate.
epsilon_decay
float
default:"0.995"
Exploration rate decay per episode.
buffer_size
int
default:"10000"
Experience replay buffer size.
batch_size
int
default:"64"
Batch size for training.
update_target_every
int
default:"100"
Number of steps between target network updates.
double_q
bool
default:"False"
Whether to use Double DQN.
dueling
bool
default:"False"
Whether to use Dueling DQN architecture.

Methods

train

def train(
    self,
    env,
    episodes: int = 1000,
    max_steps: int = 1000,
    render: bool = False,
    verbose: bool = True,
    callback: Optional[Callable[[Dict[str, Any]], bool]] = None,
) -> Dict[str, List[float]]
Train the DQN agent on an environment.
return
Dict[str, List[float]]
Dictionary of training metrics including episode rewards and losses.

A2C - Advantage Actor-Critic

class A2C:
    def __init__(
        self,
        observation_space: Dict[str, Any],
        action_space: Dict[str, Any],
        actor_hidden_dims: List[int] = [64, 64],
        critic_hidden_dims: List[int] = [64, 64],
        actor_learning_rate: float = 0.0003,
        critic_learning_rate: float = 0.001,
        gamma: float = 0.99,
        entropy_coef: float = 0.01,
        value_coef: float = 0.5,
        max_grad_norm: float = 0.5,
        name: str = "A2C",
    )

Parameters

observation_space
Dict[str, Any]
required
Observation space specification.
action_space
Dict[str, Any]
required
Action space specification.
actor_hidden_dims
List[int]
default:"[64, 64]"
Actor network hidden layer dimensions.
critic_hidden_dims
List[int]
default:"[64, 64]"
Critic network hidden layer dimensions.
actor_learning_rate
float
default:"0.0003"
Actor learning rate.
critic_learning_rate
float
default:"0.001"
Critic learning rate.
gamma
float
default:"0.99"
Discount factor.
entropy_coef
float
default:"0.01"
Entropy loss coefficient for exploration.
value_coef
float
default:"0.5"
Value loss coefficient.
max_grad_norm
float
default:"0.5"
Maximum gradient norm for clipping.

PPO - Proximal Policy Optimization

class PPO:
    def __init__(
        self,
        observation_space: Dict[str, Any],
        action_space: Dict[str, Any],
        actor_hidden_dims: List[int] = [64, 64],
        critic_hidden_dims: List[int] = [64, 64],
        learning_rate: float = 0.0003,
        gamma: float = 0.99,
        gae_lambda: float = 0.95,
        clip_epsilon: float = 0.2,
        entropy_coef: float = 0.01,
        value_coef: float = 0.5,
        max_grad_norm: float = 0.5,
        n_epochs: int = 10,
        batch_size: int = 64,
        name: str = "PPO",
    )
PPO is one of the most popular RL algorithms, offering stable and efficient training.

DDPG - Deep Deterministic Policy Gradient

class DDPG:
    def __init__(
        self,
        observation_space: Dict[str, Any],
        action_space: Dict[str, Any],
        actor_hidden_dims: List[int] = [400, 300],
        critic_hidden_dims: List[int] = [400, 300],
        actor_learning_rate: float = 0.0001,
        critic_learning_rate: float = 0.001,
        gamma: float = 0.99,
        tau: float = 0.005,
        buffer_size: int = 1000000,
        batch_size: int = 64,
        noise_stddev: float = 0.1,
        name: str = "DDPG",
    )
DDPG is designed for continuous action spaces.

SAC - Soft Actor-Critic

class SAC:
    def __init__(
        self,
        observation_space: Dict[str, Any],
        action_space: Dict[str, Any],
        actor_hidden_dims: List[int] = [256, 256],
        critic_hidden_dims: List[int] = [256, 256],
        learning_rate: float = 0.0003,
        gamma: float = 0.99,
        tau: float = 0.005,
        alpha: float = 0.2,
        auto_alpha: bool = True,
        buffer_size: int = 1000000,
        batch_size: int = 256,
        name: str = "SAC",
    )
SAC is an off-policy algorithm for continuous control with entropy regularization.

Example Usage

import neurenix as nx
from neurenix.rl import DQN, PPO, SAC
from neurenix.agent import Environment

# Create environment
env = YourEnvironment()

observation_space = {
    'shape': (4,),
    'type': 'continuous'
}

action_space = {
    'n': 2,
    'type': 'discrete'
}

# DQN for discrete actions
dqn = DQN(
    observation_space=observation_space,
    action_space=action_space,
    hidden_dims=[128, 128],
    learning_rate=0.001,
    double_q=True,
    dueling=True
)

# Train the agent
metrics = dqn.train(
    env=env,
    episodes=1000,
    max_steps=500,
    verbose=True
)

print(f"Average reward: {np.mean(metrics['episode_rewards'])}")

# Save trained model
dqn.save("dqn_agent.pth")

# PPO for more stable training
ppo = PPO(
    observation_space=observation_space,
    action_space=action_space,
    learning_rate=0.0003,
    gae_lambda=0.95,
    clip_epsilon=0.2
)

metrics = ppo.train(env=env, episodes=500)

# SAC for continuous control
continuous_action_space = {
    'shape': (2,),
    'type': 'continuous',
    'low': -1.0,
    'high': 1.0
}

sac = SAC(
    observation_space=observation_space,
    action_space=continuous_action_space,
    auto_alpha=True
)

metrics = sac.train(env=continuous_env, episodes=1000)

Algorithm Comparison

AlgorithmAction SpaceSample EfficiencyStabilityBest For
DQNDiscreteMediumMediumDiscrete control, Atari games
A2CBothLowMediumFast training, simple tasks
PPOBothMediumHighGeneral purpose, stable training
DDPGContinuousHighMediumContinuous control, robotics
SACContinuousHighHighComplex continuous tasks

Tips

Hyperparameter tuning: Start with default hyperparameters and adjust based on your environment. Learning rate and discount factor (gamma) are often the most important.
Buffer size: Larger buffers improve sample efficiency but require more memory. Use 10K-100K for simple tasks, 1M+ for complex tasks.
Network architecture: Deeper networks (3-4 layers) work better for visual inputs, while 2-layer networks suffice for low-dimensional states.

Build docs developers (and LLMs) love