Overview
SlimeRouter provides a lightweight HTTP router for managing multiple SGLang inference workers with load balancing, health checking, and middleware support.SlimeRouter
Initialization
Create a SlimeRouter instance to manage inference workers.Arguments containing router configuration:
sglang_router_ip: Router host addresssglang_router_port: Router portslime_router_middleware_paths: List of middleware pathsslime_router_timeout: HTTP request timeoutslime_router_max_connections: Max concurrent connectionsslime_router_health_check_failure_threshold: Failures before marking worker dead
Enable verbose logging
slime/router/router.py:28
Router Methods
add_worker()
Add a new SGLang worker to the routing pool.urlorworker_url: Worker endpoint URL
slime/router/router.py:169
list_workers()
List all registered workers.slime/router/router.py:199
proxy()
Proxy requests to workers with load balancing.- Selects worker with minimum active requests
- Automatically excludes dead workers
- Increments request count on selection
- Decrements on completion
slime/router/router.py:131
Health Checking
Background Health Check Loop
SlimeRouter continuously monitors worker health in the background. Configuration:Interval in seconds between health checks
Number of consecutive failures before marking worker as dead
- Checks all workers every
rollout_health_check_intervalseconds - Sends GET request to
/healthendpoint - Increments failure count on timeout or non-200 response
- Marks worker as DEAD after threshold failures
- Removes from routing pool (but keeps in worker list)
slime/router/router.py:89
Middleware
Loading Middleware
Middleware can be loaded via configuration:slime/router/router.py:60-64
RouterArgs
Configuration arguments for router setup (from sglang_router).Common Router Arguments
Router IP address
Router port
HTTP request timeout in seconds (default: no timeout)
Maximum concurrent HTTP connectionsDefault:
sglang_server_concurrency * rollout_num_gpus // rollout_num_gpus_per_engineList of middleware function paths to load
slime/utils/arguments.py:1009-1041, slime/router/router.py:44-56
Running the Router
Standalone Mode
Run SlimeRouter as a standalone service:slime/router/router.py:17
Integrated Mode
When using--use-slime-router, the router is automatically initialized within the rollout manager.
Advanced Features
Worker State Management
Internal State:Request Routing
Selection Algorithm:slime/router/router.py:225-240
API Endpoints
Router-Specific Endpoints
| Endpoint | Method | Description |
|---|---|---|
/add_worker | POST | Add worker to pool |
/list_workers | GET | List registered workers |
/retrieve_from_text | POST | Get tokens from text (RadixTree) |
Proxied Endpoints
All other endpoints are proxied to SGLang workers:| Endpoint | Method | Description |
|---|---|---|
/generate | POST | Generate text |
/health | GET | Worker health check |
/get_model_info | GET | Model information |
/abort_request | POST | Abort generation |
- Rollout API - Generation with router
- Arguments API - Router configuration