WZRD Studio’s voice layer uses OpenAI Realtime API to let you navigate, generate, and edit using spoken commands. Learn how to register and use voice actions.
Use this file to discover all available pages before exploring further.
WZRD Studio has a built-in voice control layer powered by the OpenAI Realtime API. Every authenticated page in the app — from the project timeline to the Kanvas generative canvas to the QCut editor — can respond to spoken commands. Pages register their own voice actions at mount time, and a global set of navigation actions is always available. The system is designed so developers can expose new actions on any page with a single hook call.
Wraps the authenticated app shell. Creates the VoiceActionRegistry, registers global navigation actions, boots the Realtime session, and renders the VoiceActionButton UI element.
VoiceActionRegistry
A Map of VoiceActionName → VoiceActionRegistration[]. Registrations are stacked — the most recently registered handler wins. Deregisters automatically when the registering component unmounts.
useWzrdRealtimeSession
Manages the WebRTC transport to the OpenAI Realtime API. Handles push-to-talk, tool-call dispatch, status transitions, and error normalization.
VoiceSelectionContext
Provides scrollVoiceTargetIntoView and useVoiceSelection for highlighting and selecting UI elements by voice command — used by the timeline, shot grid, and editor panels.
VoiceAgentProvider is the root context provider for the entire voice system. It lives inside the authenticated router shell and wraps VoiceSelectionProvider:
// src/voice/VoiceAgentProvider.tsx (simplified)export function VoiceAgentProvider({ children }: { children: React.ReactNode }) { const registry = useMemo(() => createVoiceActionRegistry(), []); const navigate = useNavigate(); const location = useLocation(); const { isAuthenticated } = useAuth(); // Recompute global navigation actions on every route change... const globalActions = useMemo( () => createGlobalVoiceActions({ navigate, getLocationPath: () => `${location.pathname}${location.search}`, getCurrentProjectId: () => getProjectIdFromPath(location.pathname), getAvailableActionNames: () => Array.from(new Set(registry.list().map((r) => r.name))).sort(), }), [location.pathname, location.search, navigate, registry], ); // ...and register them, cleaning up the previous set on each route change useEffect(() => { const unregister = globalActions.map((registration) => registry.register(registration)); return () => unregister.forEach((fn) => fn()); }, [globalActions, registry]); // Boot the OpenAI Realtime session const voiceSession = useWzrdRealtimeSession({ registry }); // Only show the mic button on authenticated pages (not / or /login) const showVoiceControl = shouldShowVoiceControl(location.pathname, isAuthenticated); return ( <VoiceAgentContext.Provider value={registry}> <VoiceSelectionProvider> {children} {showVoiceControl ? ( <VoiceActionButton status={voiceSession.status} errorMessage={voiceSession.errorMessage} onPressStart={voiceSession.pushToTalkStart} onPressEnd={voiceSession.pushToTalkStop} onDisconnect={voiceSession.disconnect} /> ) : null} </VoiceSelectionProvider> </VoiceAgentContext.Provider> );}
The VoiceActionButton is rendered on all authenticated routes except/ and /login.
The registry is created once with createVoiceActionRegistry() and exposed through context. It supports stacked registrations — if two components register the same action name, the most recently registered handler is called. When the component unmounts, its registration is automatically removed and the previous handler takes over.
Every registerable action name is a member of the VoiceActionName union type. This gives full TypeScript autocomplete and prevents typos at the registration site.
export type VoiceActionName = | 'get_app_context' | 'navigate_app' | 'start_new_project' | 'timeline_select_shot' | 'timeline_generate_shot_image' | 'timeline_generate_all_images' | 'timeline_start_directors_cut' | 'kanvas_set_studio' | 'kanvas_generate' | 'editor_import_media_by_url' | 'editor_add_clip' | 'editor_split_element' | 'editor_delete_element' | 'editor_add_title' | 'editor_export' // ... and many more (see src/voice/actions/registry.ts)
When a handler returns { status: 'needs_confirmation' }, the confirmation field contains:
interface VoiceActionConfirmation { actionName: VoiceActionName; // The action that needs confirming risk: VoiceActionRisk; message: string; // Spoken to the user: "Are you sure you want to…?" input: unknown; // Original input, echoed back for re-execution}
When a confirmation is declared in the registration, the registry returns { status: 'needs_confirmation' } on the first call. The agent asks the user to confirm, then re-calls with context.confirmed = true.
interface VoiceActionRegistration<Input = unknown> { name: VoiceActionName; // Must be a known VoiceActionName scope: string; // Descriptive string, e.g. 'timeline-page' description?: string; // Shown in the agent's tool definition confirmation?: { risk: VoiceActionRisk; message: string; // Spoken to user: "Are you sure you want to…?" }; handler: VoiceActionHandler<Input>;}
createGlobalVoiceActions produces the baseline set of actions that are always available regardless of which page is open. These are re-registered on every route change so they always reflect the current location and project context.
Action name
What it does
get_app_context
Returns current locationPath, currentProjectId, and availableActions array
navigate_app
Navigates to any VoiceNavigationTarget — home, kanvas, project timeline, editor, Directors’ Cut, etc.
start_new_project
Opens the project setup page
open_project_view
Opens a specific view within the current project (timeline, editor, studio, observability)
open_ip_vault
Opens the IP Vault page
character_open
Opens Kanvas with character-creation studio
kanvas_set_studio
Opens Kanvas with a specific studio mode and optional text prompt
Navigation targets include all major app surfaces:
useWzrdRealtimeSession manages the full lifecycle of the OpenAI Realtime WebRTC session. It exposes push-to-talk controls and a typed VoiceSessionStatus.
type VoiceSessionStatus = | 'idle' // No session active | 'connecting' // WebRTC handshake in progress | 'connected' // Ready, microphone off | 'listening' // Microphone active, capturing speech | 'thinking' // Model processing input / executing tool calls | 'speaking' // Model audio playing back | 'confirming' // Awaiting user confirmation for a risky action | 'error'; // Session error — see errorMessage
const { status, errorMessage, pushToTalkStart, // Call on mic button press pushToTalkStop, // Call on mic button release disconnect, // Tear down the WebRTC session} = useWzrdRealtimeSession({ registry });
pushToTalkStart lazily connects the session on the first press. Subsequent presses interrupt any in-progress model response before capturing new input. pushToTalkStop commits the audio buffer and sends response.create to the model.
The session is initialized with the app’s voice instructions, all registered tool definitions, and turn_detection: null (push-to-talk mode). Transcription uses gpt-4o-mini-transcribe.
The session key is fetched through the realtime-client-secret Supabase Edge Function, keeping your OpenAI key off the client:
// Internally called by useWzrdRealtimeSessionconst sessionInfo = await fetchRealtimeClientSecret();// → { clientSecret: '...', model: 'gpt-4o-realtime-preview' }
The API key is never passed directly from the client. useWzrdRealtimeSession always calls the realtime-client-secret Supabase Edge Function to obtain a short-lived ephemeral key — never expose your OpenAI service key in the renderer process.
scrollVoiceTargetIntoView(id) scrolls the registered element into the viewport and applies a brief selection highlight — used by timeline_select_shot, ip_vault_select_item, and similar actions.
This lets you invoke any registered voice action from the browser console or from Playwright tests without speaking:
// In browser DevTools or a Playwright testawait window.__wzrdVoiceActionTest.execute('navigate_app', { target: 'kanvas_image' });// → { ok: true, status: 'completed', message: 'Opened kanvas_image.', data: { path: '/kanvas?studio=image' } }
The __wzrdVoiceActionTest harness is only attached when both import.meta.env.DEV is true and VITE_BYPASS_AUTH_FOR_TESTS is "true". It is stripped from production builds.
Pages can push contextual narration to the voice agent without the user pressing the mic button. Fire the wzrd:voice-oob-narrate custom DOM event:
window.dispatchEvent( new CustomEvent('wzrd:voice-oob-narrate', { detail: { text: 'Storyline generation is complete. Your project has 12 scenes and 48 shots.', topic: 'storyline_stream', }, }));
The active Realtime session will speak the update as a single concise highlight. If no session is connected, the event is silently ignored.