No API keys required
The model runs locally inside the browser tab. The only network traffic after the initial model download is JSON-RPC to the backend at/mcp. There is no external AI provider involved.
The first run downloads approximately 350 MB of ONNX model assets from Hugging Face and caches them in the browser. Subsequent page loads skip the download. The
DemoTranscript component adapts its submit label to reflect the cache state: "Download + chat" on a cold cache, "Send to local model" once the assets are fully cached.Model identity
The model isLiquidAI/LFM2.5-350M-ONNX, resolved in ui/src/features/assistant/lib/lfm-model.ts:
onnxruntime-web InferenceSession with the webgpu execution provider. Tokenisation is handled by @huggingface/transformers AutoTokenizer.
Feature structure
Model loading: LfmBrowserModel
LfmBrowserModel (lib/lfm-browser.ts) manages the load lifecycle with a single loadPromise guard so concurrent callers share one download:
@huggingface/transformers and onnxruntime-web/webgpu in parallel, initialises the tokenizer with progress callbacks, then opens an ONNX InferenceSession:
navigator.gpu is absent, the load throws "WebGPU is unavailable in this browser.".
MCP connection: CoffeeMcpClient
CoffeeMcpClient (lib/mcp-client.ts) is a minimal JSON-RPC 2.0 HTTP client that speaks the MCP protocol. It defaults to the /mcp path, which the Vite dev-server proxy forwards to the backend at http://localhost:3000/mcp.
initialize handshake and negotiates the protocol version (2025-06-18). Tool definitions are cached in-memory after the first tools/list call. The session ID returned in the Mcp-Session-Id response header is attached to subsequent requests.
The agentic loop: processAssistantTurn
processAssistantTurn (lib/assistant-loop.ts) drives a multi-step reasoning loop capped at three tool-use steps:
<|tool_call_start|> and <|tool_call_end|>, executes the tool via CoffeeMcpClient.callTool(), and appends the result as a user message before the next generation. When no tool call is detected (or after the tool result is appended), the loop returns the final assistant text and the list of AssistantEvent records for the activity drawer.
The system prompt identifies the assistant as Beanline and instructs it to always call list_menu before answering menu questions, and list_orders or get_order for queue questions:
State management: useLfmCoffeeAssistant
The useLfmCoffeeAssistant hook (hooks/useLfmCoffeeAssistant.ts) owns all assistant state through a reducer (reduceAssistantState) and an AssistantRuntime ref. The runtime ref holds the LfmBrowserModel, the CoffeeMcpClient, the current conversation history, and the cached tool definitions — none of which should trigger re-renders on mutation.
Effect v4 is used to sequence async steps (load tools → load model → run turn) with structured error handling:
UI components
BrowserMcpLandingView
Top-level presentational layout. Two-column hero (copy + facts card) above the
DemoTranscript. Includes a “Preload browser model” button that calls onWarmUp to download and cache model assets before the user’s first prompt.DemoTranscript
Chat container. Renders
DemoStatusPanel (progress bar), preset prompt buttons, a scrollable message list with TranscriptBubble entries for the user (“You”) and the assistant (“Beanline”), and DemoComposer at the bottom.DemoStatusPanel
Displays the current
DemoStatus phase (idle | loading | running | ready | error) with a progress bar and a text label such as "Thinking with local WebGPU" or "Model ready in this tab with N Coffee Shop tools.".ToolActivityDrawer
Opens a
vaul drawer listing every AssistantEvent. Each event is rendered as a ToolActivityCard showing the tool name and formatted arguments or result. Allows the user to inspect exactly which MCP tools the model called and what they returned.Preset prompts
Three starter prompts are defined inhooks/lfmAssistantSupport.ts and displayed as buttons in the transcript: