The server-client architecture allows you to run GR00T inference on a remote GPU server while controlling robots from a separate client machine. This setup uses ZeroMQ for fast, low-latency communication.
Architecture overview
The PolicyServer runs on a GPU-equipped machine and handles inference requests, while PolicyClient runs on your robot controller or simulation environment:
# Server side (GPU machine)
from gr00t.policy.server_client import PolicyServer
from gr00t.policy.gr00t_policy import Gr00tPolicy
policy = Gr00tPolicy(
embodiment_tag="GR1",
model_path="nvidia/GR00T-N1.6-3B",
device="cuda"
)
server = PolicyServer(policy, host="0.0.0.0", port=5555)
server.run()
# Client side (robot controller)
from gr00t.policy.server_client import PolicyClient
policy = PolicyClient(host="192.168.1.100", port=5555)
action, info = policy.get_action(observation)
Starting the server
Use the provided server script to start inference:
uv run python gr00t/eval/run_gr00t_server.py \
--embodiment-tag GR1 \
--model-path nvidia/GR00T-N1.6-3B \
--device cuda:0 \
--host 0.0.0.0 \
--port 5555
Verify the server is running
Starting GR00T inference server...
Embodiment tag: GR1
Model path: nvidia/GR00T-N1.6-3B
Device: cuda
Host: 0.0.0.0
Port: 5555
Server is ready and listening on tcp://0.0.0.0:5555
from gr00t.policy.server_client import PolicyClient
policy = PolicyClient(host="192.168.1.100", port=5555)
if policy.ping():
print("Connected successfully!")
else:
print("Connection failed")
Server configuration
The PolicyServer class provides several configuration options:
Basic configuration
server = PolicyServer(
policy=policy,
host="*", # Bind to all interfaces
port=5555, # Default port
api_token=None # Optional authentication
)
With authentication
server = PolicyServer(
policy=policy,
host="0.0.0.0",
port=5555,
api_token="your-secret-token"
)
On the client side:
policy = PolicyClient(
host="192.168.1.100",
port=5555,
api_token="your-secret-token"
)
Server endpoints
The server registers these endpoints by default:
ping: Health check endpoint
kill: Gracefully shutdown the server
get_action: Get action from observation
reset: Reset policy state
get_modality_config: Retrieve modality configuration
Client usage
The PolicyClient implements the same interface as BasePolicy:
from gr00t.policy.server_client import PolicyClient
# Initialize client
policy = PolicyClient(
host="localhost",
port=5555,
timeout_ms=15000, # Request timeout
strict=False # Disable client-side validation
)
# Get modality configuration
modality_config = policy.get_modality_config()
# Reset policy
policy.reset(options={"episode_index": 0})
# Get action
action, info = policy.get_action(observation)
# Shutdown server (optional)
policy.kill_server()
Timeout configuration
Configure request timeout based on your inference latency:
policy = PolicyClient(
host="192.168.1.100",
port=5555,
timeout_ms=30000 # 30 second timeout for slow networks
)
Custom endpoints
Register custom endpoints for application-specific functionality:
from gr00t.policy.server_client import PolicyServer
def custom_handler(param1: str, param2: int) -> dict:
# Your custom logic
return {"result": f"Processed {param1} with {param2}"}
server = PolicyServer(policy, port=5555)
server.register_endpoint(
"custom_endpoint",
custom_handler,
requires_input=True
)
server.run()
Call from client:
result = policy.call_endpoint(
"custom_endpoint",
data={"param1": "test", "param2": 42}
)
Message serialization
The server uses MessagePack for efficient serialization with custom handling for NumPy arrays and modality configs:
class MsgSerializer:
@staticmethod
def encode_custom_classes(obj):
if isinstance(obj, np.ndarray):
output = io.BytesIO()
np.save(output, obj, allow_pickle=False)
return {"__ndarray_class__": True, "as_npy": output.getvalue()}
# ... other types
This ensures fast, zero-copy transfer of large arrays like images.
Debugging with replay policy
Test your client integration without a trained model:
uv run python gr00t/eval/run_gr00t_server.py \
--dataset-path demo_data/gr1.PickNPlace \
--embodiment-tag GR1 \
--execution-horizon 8
The server replays actions from the dataset. Switch episodes with:
policy.reset(options={"episode_index": 5})
The replay policy is useful for verifying environment setup, observation formatting, and action execution without requiring a trained model.
Network considerations
Firewall configuration
Ensure port 5555 (or your chosen port) is open:
# On server machine
sudo ufw allow 5555/tcp
Latency optimization
For low-latency applications:
- Use dedicated network interfaces
- Disable TCP Nagle’s algorithm (ZeroMQ does this by default)
- Run server and client on the same machine when possible
- Consider using InfiniBand or high-speed Ethernet
The server binds to 0.0.0.0 by default, making it accessible from any network interface. Use api_token authentication or firewall rules to restrict access in production deployments.
Error handling
The server catches exceptions and returns error responses:
try:
action, info = policy.get_action(observation)
except RuntimeError as e:
if "Server error" in str(e):
print(f"Server error: {e}")
# Reconnect or handle error
Common errors:
"Unauthorized: Invalid API token": Authentication failed
"Unknown endpoint": Endpoint not registered
- Connection timeout: Server unreachable or overloaded
Server script configuration
The run_gr00t_server.py script supports additional options:
@dataclass
class ServerConfig:
model_path: str | None = None
embodiment_tag: EmbodimentTag = EmbodimentTag.NEW_EMBODIMENT
device: str = "cuda"
dataset_path: str | None = None # For replay policy
modality_config_path: str | None = None
execution_horizon: int | None = None
host: str = "0.0.0.0"
port: int = 5555
strict: bool = True
use_sim_policy_wrapper: bool = False
Example with simulation wrapper:
uv run python gr00t/eval/run_gr00t_server.py \
--model-path nvidia/GR00T-N1.6-3B \
--embodiment-tag GR1 \
--use-sim-policy-wrapper