Skip to main content

Overview

SlimeRouter provides a lightweight HTTP router for managing multiple SGLang inference workers with load balancing, health checking, and middleware support.

SlimeRouter

Initialization

Create a SlimeRouter instance to manage inference workers.
from slime.router.router import SlimeRouter

router = SlimeRouter(args, verbose=True)
args
Namespace
required
Arguments containing router configuration:
  • sglang_router_ip: Router host address
  • sglang_router_port: Router port
  • slime_router_middleware_paths: List of middleware paths
  • slime_router_timeout: HTTP request timeout
  • slime_router_max_connections: Max concurrent connections
  • slime_router_health_check_failure_threshold: Failures before marking worker dead
verbose
bool
default:"False"
Enable verbose logging
Source: slime/router/router.py:28

Router Methods

add_worker()

Add a new SGLang worker to the routing pool.
# Via HTTP request
import httpx

response = await httpx.post(
    "http://localhost:30000/add_worker?url=http://127.0.0.1:10001"
)

# Or with JSON body
response = await httpx.post(
    "http://localhost:30000/add_worker",
    json={"url": "http://127.0.0.1:10001"}
)
Parameters:
  • url or worker_url: Worker endpoint URL
Returns:
{
  "status": "success",
  "worker_urls": {
    "http://127.0.0.1:10001": 0,
    "http://127.0.0.1:10002": 0
  }
}
Source: slime/router/router.py:169

list_workers()

List all registered workers.
response = await httpx.get("http://localhost:30000/list_workers")
Returns:
{
  "urls": [
    "http://127.0.0.1:10001",
    "http://127.0.0.1:10002"
  ]
}
Source: slime/router/router.py:199

proxy()

Proxy requests to workers with load balancing.
# All other routes are proxied to SGLang workers
response = await httpx.post(
    "http://localhost:30000/generate",
    json={
        "input_ids": [1, 2, 3],
        "sampling_params": {"temperature": 0.8}
    }
)
Load Balancing Strategy:
  • Selects worker with minimum active requests
  • Automatically excludes dead workers
  • Increments request count on selection
  • Decrements on completion
Source: slime/router/router.py:131

Health Checking

Background Health Check Loop

SlimeRouter continuously monitors worker health in the background. Configuration:
rollout_health_check_interval
float
default:"30.0"
Interval in seconds between health checks
slime_router_health_check_failure_threshold
int
default:"3"
Number of consecutive failures before marking worker as dead
Behavior:
  1. Checks all workers every rollout_health_check_interval seconds
  2. Sends GET request to /health endpoint
  3. Increments failure count on timeout or non-200 response
  4. Marks worker as DEAD after threshold failures
  5. Removes from routing pool (but keeps in worker list)
Source: slime/router/router.py:89

Middleware

Loading Middleware

Middleware can be loaded via configuration:
python train.py \
  --use-slime-router \
  --slime-router-middleware-paths \
    slime.router.middleware_hub.radix_tree_middleware:RadixTreeMiddleware
Middleware Interface:
from starlette.middleware.base import BaseHTTPMiddleware

class CustomMiddleware(BaseHTTPMiddleware):
    def __init__(self, app, router):
        super().__init__(app)
        self.router = router
    
    async def dispatch(self, request, call_next):
        # Pre-processing
        response = await call_next(request)
        # Post-processing
        return response
Source: slime/router/router.py:60-64

RouterArgs

Configuration arguments for router setup (from sglang_router).

Common Router Arguments

sglang_router_ip
str
default:"127.0.0.1"
Router IP address
sglang_router_port
int
default:"30000"
Router port
slime_router_timeout
float
HTTP request timeout in seconds (default: no timeout)
slime_router_max_connections
int
Maximum concurrent HTTP connectionsDefault: sglang_server_concurrency * rollout_num_gpus // rollout_num_gpus_per_engine
slime_router_middleware_paths
list[str]
default:"[]"
List of middleware function paths to load
Source: slime/utils/arguments.py:1009-1041, slime/router/router.py:44-56

Running the Router

Standalone Mode

Run SlimeRouter as a standalone service:
from slime.router.router import run_router
from argparse import Namespace

args = Namespace(
    sglang_router_ip="0.0.0.0",
    sglang_router_port=30000,
    slime_router_middleware_paths=[],
    rollout_health_check_interval=30.0,
    slime_router_health_check_failure_threshold=3
)

run_router(args)
Source: slime/router/router.py:17

Integrated Mode

When using --use-slime-router, the router is automatically initialized within the rollout manager.
python train.py \
  --use-slime-router \
  --sglang-router-ip 0.0.0.0 \
  --sglang-router-port 30000

Advanced Features

Worker State Management

Internal State:
self.worker_request_counts: dict[str, int]  # URL -> active request count
self.worker_failure_counts: dict[str, int]  # URL -> consecutive failures
self.dead_workers: set[str]                  # Quarantined worker URLs

Request Routing

Selection Algorithm:
def _use_url(self):
    """Select worker with minimal active requests"""
    if not self.dead_workers:
        # Healthy path: all workers available
        url = min(self.worker_request_counts, key=self.worker_request_counts.get)
    else:
        # Degraded path: exclude dead workers
        valid_workers = (w for w in self.worker_request_counts 
                        if w not in self.dead_workers)
        url = min(valid_workers, key=self.worker_request_counts.get)
    
    self.worker_request_counts[url] += 1
    return url
Source: slime/router/router.py:225-240

API Endpoints

Router-Specific Endpoints

EndpointMethodDescription
/add_workerPOSTAdd worker to pool
/list_workersGETList registered workers
/retrieve_from_textPOSTGet tokens from text (RadixTree)

Proxied Endpoints

All other endpoints are proxied to SGLang workers:
EndpointMethodDescription
/generatePOSTGenerate text
/healthGETWorker health check
/get_model_infoGETModel information
/abort_requestPOSTAbort generation
See Also:

Build docs developers (and LLMs) love