LeRobot’s async inference system enables you to run policy inference on a remote server (typically a GPU machine) while controlling your robot from a separate client (typically running on the robot’s embedded computer). This architecture minimizes latency and enables real-time robot control.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/huggingface/lerobot/llms.txt
Use this file to discover all available pages before exploring further.
Architecture
The async inference system consists of two components:- Policy Server: Runs the neural network policy on a GPU machine
- Robot Client: Controls the robot and communicates with the policy server
Starting the Policy Server
The policy server loads a pretrained model and waits for client connections:Server Configuration
Server host address to bind to
Server port number
Target frames per second for observation processing
Target inference latency in seconds (controls inference throttling)
Timeout for observation queue in seconds
Running the Robot Client
The robot client captures observations, sends them to the server, and executes received actions:Client Configuration
Robot configuration including type, port, and camera setup
Task instruction for the robot (e.g., “pick up the cup”)
Address of the policy server
Type of policy to use (e.g., “act”, “pi0”, “diffusion”)
HuggingFace Hub model ID or local path to pretrained model
Device for policy inference on the server (e.g., “cuda”, “cuda:0”, “mps”)
Device to move actions to after receiving from server
Number of actions per chunk to request from the policy
Threshold for triggering new observations (0.0-1.0). When the action queue size drops below
chunk_size_threshold * actions_per_chunk, a new observation is sent to the server.Function to aggregate overlapping actions. Options:
weighted_average: 0.3 * old + 0.7 * newlatest_only: Always use new actionsaverage: 0.5 * old + 0.5 * newconservative: 0.7 * old + 0.3 * new
Control Flow
The async inference system operates in two parallel threads on the client:Thread 1: Control Loop
Runs at the robot’s control frequency (e.g., 30 Hz):- Execute Action: If actions are available in the queue, pop and execute the next action
- Stream Observation: When queue size drops below threshold, capture and send observation to server
Thread 2: Action Receiver
Continuously receives action chunks from the server:Action Queue Management
The client maintains an action queue and aggregates overlapping actions:Must-Go Observations
The system uses a “must-go” flag to ensure observations are processed when the action queue is empty:Server-Side Processing
The policy server maintains an observation queue (size 1) and processes observations on demand:Performance Optimization
Reduce Observation Sending
Adjustchunk_size_threshold to control how often observations are sent:
Custom Aggregation Functions
Define custom aggregation logic for your robot:Device Placement
Move actions to GPU for downstream planners:Debugging
Enable debug visualization to monitor action queue size:Example: SO-100 Robot
Complete example for running async inference on an SO-100 robot:API Reference
RobotClient
Seelerobot/async_inference/robot_client.py:83
Connect to the policy server and initialize the client
Stop the client and disconnect from the server
Main control loop that executes actions and streams observations
PolicyServer
Seelerobot/async_inference/policy_server.py:66
Load the policy model based on client instructions
Receive observation from the robot client
Generate and return action chunk based on latest observation