Skip to main content
Tool-calling that actually feels instant on a laptop. Building a local AI agent sounds great until you try to use one all day. The hard part isn’t getting a model to understand you — it’s getting it to choose the right tool and do it fast enough that the experience feels interactive. This is where LFM2-24B-A2B shines: it’s designed for tool dispatch on consumer hardware, where latency and memory aren’t abstract constraints — they decide whether your agent is a product or a demo. LocalCowork is a desktop AI agent that runs entirely on-device. No cloud APIs, no data leaving your machine. The model calls pre-built tools via the Model Context Protocol (MCP), and every tool execution is logged to a local audit trail.

What it does

LocalCowork ships with 75 tools across 14 MCP servers covering filesystem operations, document processing, OCR, security scanning, email drafting, task management, and more. For the demo, we run a curated set of 20 tools across 6 servers — every tool scoring 80%+ single-step accuracy with proven multi-step chain participation.

Demo 1: Scan for leaked secrets

Every developer has .env files with API keys scattered across old projects. You’d never upload your filesystem to a cloud model to find them. That defeats the purpose.
You:   "Scan my Projects folder for exposed API keys"
Agent: security.scan_for_secrets → found 3 secrets in 2 files (420ms)

You:   "Encrypt the ones you found"
Agent: security.encrypt_file → encrypted .env and config.yaml (380ms)

You:   "Show me the audit trail"
Agent: audit.get_tool_log → 3 tool calls, all succeeded (12ms)
Three tools, under 2 seconds total. The scan, encryption, and audit trail all happen locally.

Demo 2: Compare contracts without a cloud API

A freelancer gets a revised NDA. They need to know what changed. These are confidential documents that should never leave the machine.
You:   "Compare these two contract versions"
Agent: document.extract_text (v1) → 2,400 words (350ms)
       document.extract_text (v2) → 2,600 words (340ms)
       document.diff_documents → 12 changes found (180ms)
       document.create_pdf → diff_report.pdf generated (420ms)
Four tools, under 2 seconds. The extraction, diff, and PDF generation never touch a network. The simplest test of an agent: can it answer a direct question in one tool call without going off-script?
You:   "List what's in my Downloads folder"
Agent: filesystem.list_dir → 26 files found (9ms)
       "Here are the files: DEMO CARD styles.png, benchmark_results.csv,
        Liquid AI Notes.pdf, and 23 others."
One tool, one answer. No unnecessary follow-up calls, no asking what to do next.

Architecture

Presentation    Tauri 2.0 (Rust) + React/TypeScript
                    |
Agent Core      Rust — ConversationManager, ToolRouter, MCP Client,
                       Orchestrator, ToolPreFilter, Audit
                    |
Inference       OpenAI-compatible API @ localhost (llama.cpp / Ollama / vLLM)
                    |
MCP Servers     14 servers, 75 tools (8 TypeScript + 6 Python)
The agent core communicates with the inference layer via the OpenAI chat completions API. Changing the model is a config change, not a code change. MCP servers are auto-discovered at startup by scanning mcp-servers/.

MCP servers

ServerLangToolsWhat It Does
filesystemTS9File CRUD, search, watch (sandboxed)
documentPy8Text extraction, conversion, diff, PDF generation
ocrPy4LFM Vision primary, Tesseract fallback
knowledgePy5SQLite-vec RAG pipeline, semantic search
meetingPy4Whisper.cpp transcription + diarization
securityPy6PII/secrets scanning + encryption
calendarTS4.ics parsing + system calendar API
emailTS5MBOX/Maildir parsing + SMTP
taskTS5Local SQLite task database
dataTS5CSV + SQLite operations
auditTS4Audit log reader + compliance reports
clipboardTS3OS clipboard (Tauri bridge)
systemTS10OS APIs — sysinfo, processes, screenshots
screenshot-pipelinePy3Capture, UI elements, action suggestion

Benchmarks

We tested 6 models against 67 tools on Apple M4 Max. LFM2-24B-A2B (24B total, ~2B active per token) delivers 80% tool accuracy at 390ms. That’s 94% of the best dense model’s accuracy at 3% of its latency.
ModelActive ParamsAccuracyLatencyMulti-Step
LFM2-24B-A2B~2B (MoE)80%390ms26%
Gemma 3 27B27B (dense)91%24,088ms48%
Mistral-Small-24B24B (dense)85%1,239ms66%
Qwen3 32B32B (dense)~70%28,385ms
GPT-OSS-20B~3.6B (MoE)51%2,303ms0%
Qwen3-30B-A3B~3B (MoE)44%5,938ms4%
The speed comes from the combination of the hybrid conv+attention design and MoE sparsity.

Quick start

1

Clone and set up

git clone <repo-url> && cd localCoWork
./scripts/setup-dev.sh
2

Download LFM2-24B-A2B

Requires HuggingFace access. Request access.
pip install huggingface-hub
python3 -c "
from huggingface_hub import hf_hub_download
hf_hub_download('LiquidAI/LFM2-24B-A2B-GGUF',
                'LFM2-24B-A2B-Q4_K_M.gguf',
                local_dir='$HOME/Projects/_models/')
"
This downloads ~14 GB.
3

Start the model server

./scripts/start-model.sh
4

Launch the app

In another terminal:
cargo tauri dev
MCP servers start automatically.

Prerequisites

| Requirement | Version | Purpose | |-------------|---------|---------|| | Node.js | 20+ | TypeScript MCP servers, React frontend | | Python | 3.11+ | Python MCP servers (document, OCR, security, etc.) | | Rust | 1.77+ | Tauri backend, Agent Core | | llama.cpp | latest | Serves LFM2 models (brew install llama.cpp) | Optional: Ollama (alternative runtime), Tesseract (fallback OCR).

Known limitations

  • 1-2 step workflows are reliable. 4+ step chains degrade as conversation history grows — multi-step completion is 26% across all tools.
  • Batch operations process partial results — the model may handle 2 items from a set of 10.
  • Cross-server transitions are the universal barrier. Every model tested fails at these. UX is designed around human confirmation to compensate.
These limits are documented because they’re instructive.

Source code

View the complete source code on GitHub.

Build docs developers (and LLMs) love