When the pipeline runs inDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/huggingface/speech-to-speech/llms.txt
Use this file to discover all available pages before exploring further.
socket, websocket, or realtime mode it listens on network interfaces for audio data. Three argument classes control these networking concerns: SocketReceiverArguments and SocketSenderArguments for TCP socket mode, and WebSocketStreamerArguments for WebSocket and realtime modes.
These flags have no shared prefix — pass them directly, for example --recv_host 0.0.0.0 or --ws_port 8765.
SocketReceiverArguments
Used with--mode socket. The receiver opens a TCP socket on the server and accepts raw audio from a remote client.
Host address the receiver socket binds to. Use
0.0.0.0 to accept connections from any network interface (required when the pipeline is on a remote server). Use localhost (default) for local testing where the client and server are on the same machine.TCP port number the receiver listens on. Ensure this port is open in any firewall rules when accepting remote connections.
Size of each audio data chunk read from the socket per receive call, in bytes. Larger chunks reduce syscall overhead; smaller chunks reduce buffering latency.
SocketSenderArguments
Used with--mode socket. The sender opens a TCP connection to the client and streams synthesized audio back.
Host address of the client that will receive the synthesized audio. Set to the IP address of the client machine when the pipeline runs on a remote server.
TCP port number on the client that the sender connects to for returning synthesized audio.
WebSocketStreamerArguments
Used with--mode websocket and --mode realtime. The WebSocket streamer opens a bidirectional WebSocket server for audio input and output.
Host address the WebSocket server binds to. The default
0.0.0.0 accepts connections on all network interfaces. Change to a specific IP to restrict access.Port number the WebSocket server listens on. In
realtime mode this port also serves the OpenAI Realtime-compatible endpoint at /v1/realtime.Usage examples
Connecting to the WebSocket server
Inwebsocket mode the server accepts raw PCM audio bytes (16 kHz, int16, mono) and returns synthesized audio bytes over the same connection:
Connecting in realtime mode
Inrealtime mode the server speaks the OpenAI Realtime protocol. Use the OpenAI Python client:
The
realtime mode WebSocket endpoint is at /v1/realtime on ws_port. The websocket mode serves at the root path /. Both share the same --ws_host and --ws_port flags.