What it does
LocalCowork ships with 75 tools across 14 MCP servers covering filesystem operations, document processing, OCR, security scanning, email drafting, task management, and more. For the demo, we run a curated set of 20 tools across 6 servers — every tool scoring 80%+ single-step accuracy with proven multi-step chain participation.Demo 1: Scan for leaked secrets
Every developer has.env files with API keys scattered across old projects. You’d never upload your filesystem to a cloud model to find them. That defeats the purpose.
Demo 2: Compare contracts without a cloud API
A freelancer gets a revised NDA. They need to know what changed. These are confidential documents that should never leave the machine.Demo 3: File search
The simplest test of an agent: can it answer a direct question in one tool call without going off-script?Architecture
mcp-servers/.
MCP servers
| Server | Lang | Tools | What It Does |
|---|---|---|---|
| filesystem | TS | 9 | File CRUD, search, watch (sandboxed) |
| document | Py | 8 | Text extraction, conversion, diff, PDF generation |
| ocr | Py | 4 | LFM Vision primary, Tesseract fallback |
| knowledge | Py | 5 | SQLite-vec RAG pipeline, semantic search |
| meeting | Py | 4 | Whisper.cpp transcription + diarization |
| security | Py | 6 | PII/secrets scanning + encryption |
| calendar | TS | 4 | .ics parsing + system calendar API |
| TS | 5 | MBOX/Maildir parsing + SMTP | |
| task | TS | 5 | Local SQLite task database |
| data | TS | 5 | CSV + SQLite operations |
| audit | TS | 4 | Audit log reader + compliance reports |
| clipboard | TS | 3 | OS clipboard (Tauri bridge) |
| system | TS | 10 | OS APIs — sysinfo, processes, screenshots |
| screenshot-pipeline | Py | 3 | Capture, UI elements, action suggestion |
Benchmarks
We tested 6 models against 67 tools on Apple M4 Max. LFM2-24B-A2B (24B total, ~2B active per token) delivers 80% tool accuracy at 390ms. That’s 94% of the best dense model’s accuracy at 3% of its latency.| Model | Active Params | Accuracy | Latency | Multi-Step |
|---|---|---|---|---|
| LFM2-24B-A2B | ~2B (MoE) | 80% | 390ms | 26% |
| Gemma 3 27B | 27B (dense) | 91% | 24,088ms | 48% |
| Mistral-Small-24B | 24B (dense) | 85% | 1,239ms | 66% |
| Qwen3 32B | 32B (dense) | ~70% | 28,385ms | — |
| GPT-OSS-20B | ~3.6B (MoE) | 51% | 2,303ms | 0% |
| Qwen3-30B-A3B | ~3B (MoE) | 44% | 5,938ms | 4% |
Quick start
Download LFM2-24B-A2B
Prerequisites
| Requirement | Version | Purpose | |-------------|---------|---------|| | Node.js | 20+ | TypeScript MCP servers, React frontend | | Python | 3.11+ | Python MCP servers (document, OCR, security, etc.) | | Rust | 1.77+ | Tauri backend, Agent Core | | llama.cpp | latest | Serves LFM2 models (brew install llama.cpp) |
Optional: Ollama (alternative runtime), Tesseract (fallback OCR).
Known limitations
- 1-2 step workflows are reliable. 4+ step chains degrade as conversation history grows — multi-step completion is 26% across all tools.
- Batch operations process partial results — the model may handle 2 items from a set of 10.
- Cross-server transitions are the universal barrier. Every model tested fails at these. UX is designed around human confirmation to compensate.