LocalCowork

Tool-calling that actually feels instant on a laptop. Building a local AI agent sounds great until you try to use one all day. The hard part isn’t getting a model to understand you — it’s getting it to choose the right tool and do it fast enough that the experience feels interactive. This is where LFM2-24B-A2B shines: it’s designed for tool dispatch on consumer hardware, where latency and memory aren’t abstract constraints — they decide whether your agent is a product or a demo. LocalCowork is a desktop AI agent that runs entirely on-device. No cloud APIs, no data leaving your machine. The model calls pre-built tools via the Model Context Protocol (MCP), and every tool execution is logged to a local audit trail.

What it does

LocalCowork ships with 75 tools across 14 MCP servers covering filesystem operations, document processing, OCR, security scanning, email drafting, task management, and more. For the demo, we run a curated set of 20 tools across 6 servers — every tool scoring 80%+ single-step accuracy with proven multi-step chain participation.

Demo 1: Scan for leaked secrets

Every developer has .env files with API keys scattered across old projects. You’d never upload your filesystem to a cloud model to find them. That defeats the purpose.

You:   "Scan my Projects folder for exposed API keys"
Agent: security.scan_for_secrets → found 3 secrets in 2 files (420ms)

You:   "Encrypt the ones you found"
Agent: security.encrypt_file → encrypted .env and config.yaml (380ms)

You:   "Show me the audit trail"
Agent: audit.get_tool_log → 3 tool calls, all succeeded (12ms)

Three tools, under 2 seconds total. The scan, encryption, and audit trail all happen locally.

Demo 2: Compare contracts without a cloud API

A freelancer gets a revised NDA. They need to know what changed. These are confidential documents that should never leave the machine.

You:   "Compare these two contract versions"
Agent: document.extract_text (v1) → 2,400 words (350ms)
       document.extract_text (v2) → 2,600 words (340ms)
       document.diff_documents → 12 changes found (180ms)
       document.create_pdf → diff_report.pdf generated (420ms)

Four tools, under 2 seconds. The extraction, diff, and PDF generation never touch a network.

Demo 3: File search

The simplest test of an agent: can it answer a direct question in one tool call without going off-script?

You:   "List what's in my Downloads folder"
Agent: filesystem.list_dir → 26 files found (9ms)
       "Here are the files: DEMO CARD styles.png, benchmark_results.csv,
        Liquid AI Notes.pdf, and 23 others."

One tool, one answer. No unnecessary follow-up calls, no asking what to do next.

Architecture

Presentation    Tauri 2.0 (Rust) + React/TypeScript
                    |
Agent Core      Rust — ConversationManager, ToolRouter, MCP Client,
                       Orchestrator, ToolPreFilter, Audit
                    |
Inference       OpenAI-compatible API @ localhost (llama.cpp / Ollama / vLLM)
                    |
MCP Servers     14 servers, 75 tools (8 TypeScript + 6 Python)

The agent core communicates with the inference layer via the OpenAI chat completions API. Changing the model is a config change, not a code change. MCP servers are auto-discovered at startup by scanning mcp-servers/.

MCP servers

Server	Lang	Tools	What It Does
filesystem	TS	9	File CRUD, search, watch (sandboxed)
document	Py	8	Text extraction, conversion, diff, PDF generation
ocr	Py	4	LFM Vision primary, Tesseract fallback
knowledge	Py	5	SQLite-vec RAG pipeline, semantic search
meeting	Py	4	Whisper.cpp transcription + diarization
security	Py	6	PII/secrets scanning + encryption
calendar	TS	4	.ics parsing + system calendar API
email	TS	5	MBOX/Maildir parsing + SMTP
task	TS	5	Local SQLite task database
data	TS	5	CSV + SQLite operations
audit	TS	4	Audit log reader + compliance reports
clipboard	TS	3	OS clipboard (Tauri bridge)
system	TS	10	OS APIs — sysinfo, processes, screenshots
screenshot-pipeline	Py	3	Capture, UI elements, action suggestion

Benchmarks

We tested 6 models against 67 tools on Apple M4 Max. LFM2-24B-A2B (24B total, ~2B active per token) delivers 80% tool accuracy at 390ms. That’s 94% of the best dense model’s accuracy at 3% of its latency.

Model	Active Params	Accuracy	Latency	Multi-Step
LFM2-24B-A2B	~2B (MoE)	80%	390ms	26%
Gemma 3 27B	27B (dense)	91%	24,088ms	48%
Mistral-Small-24B	24B (dense)	85%	1,239ms	66%
Qwen3 32B	32B (dense)	~70%	28,385ms	—
GPT-OSS-20B	~3.6B (MoE)	51%	2,303ms	0%
Qwen3-30B-A3B	~3B (MoE)	44%	5,938ms	4%

The speed comes from the combination of the hybrid conv+attention design and MoE sparsity.

Quick start

Clone and set up

git clone <repo-url> && cd localCoWork
./scripts/setup-dev.sh

Download LFM2-24B-A2B

Requires HuggingFace access. Request access.

pip install huggingface-hub
python3 -c "
from huggingface_hub import hf_hub_download
hf_hub_download('LiquidAI/LFM2-24B-A2B-GGUF',
                'LFM2-24B-A2B-Q4_K_M.gguf',
                local_dir='$HOME/Projects/_models/')
"

This downloads ~14 GB.

Start the model server

./scripts/start-model.sh

Launch the app

In another terminal:

cargo tauri dev

MCP servers start automatically.

Prerequisites

| Requirement | Version | Purpose | |-------------|---------|---------|| | Node.js | 20+ | TypeScript MCP servers, React frontend | | Python | 3.11+ | Python MCP servers (document, OCR, security, etc.) | | Rust | 1.77+ | Tauri backend, Agent Core | | llama.cpp | latest | Serves LFM2 models (brew install llama.cpp) | Optional: Ollama (alternative runtime), Tesseract (fallback OCR).

Known limitations

1-2 step workflows are reliable. 4+ step chains degrade as conversation history grows — multi-step completion is 26% across all tools.
Batch operations process partial results — the model may handle 2 items from a set of 10.
Cross-server transitions are the universal barrier. Every model tested fails at these. UX is designed around human confirmation to compensate.

These limits are documented because they’re instructive.

Source code

View the complete source code on GitHub.

Overview

Local AI Apps

Mobile Deployment

Fine-Tuning

Community

What it does

Demo 1: Scan for leaked secrets

Demo 2: Compare contracts without a cloud API

Demo 3: File search

Architecture

MCP servers

Benchmarks

Quick start

Prerequisites

Known limitations

Source code

Build docs developers (and LLMs) love

Overview

Local AI Apps

Mobile Deployment

Fine-Tuning

Community

Documentation Index

​What it does

​Demo 1: Scan for leaked secrets

​Demo 2: Compare contracts without a cloud API

​Demo 3: File search

​Architecture

​MCP servers

​Benchmarks

​Quick start

​Prerequisites

​Known limitations

​Source code

Build docs developers (and LLMs) love

What it does

Demo 1: Scan for leaked secrets

Demo 2: Compare contracts without a cloud API

Demo 3: File search

Architecture

MCP servers

Benchmarks

Quick start

Prerequisites

Known limitations

Source code