Local coding assistant

A minimal agentic coding assistant — a “Hello, World!” for AI agents. It mirrors the core design of Claude Code at a readable scale: a model-independent agentic loop, a small set of coding tools, simple context management, and a CLI interface.

How it works

The assistant runs a loop:

You type a request
The model decides which tools to call
Tools are executed and results are fed back to the model
The loop continues until the model has nothing left to do
The final response is printed and you can type the next request

The four tools available to the model are:

Tool	What it does
`read_file`	Read the contents of a file
`write_file`	Create or overwrite a file
`list_directory`	List files in a directory
`run_bash`	Run any shell command (grep, git, python, tests, …)

Setup

Prerequisites: uv

Enter the project directory

cd examples/local-coding-assistant

Install dependencies

uv sync

Configure API key

# Copy the example env file and add your API key
cp .env.example .env
# Edit .env and set ANTHROPIC_API_KEY

Running

uv run lca

You’ll see a prompt like this:

╔══════════════════════════════════════╗
║      Local Coding Assistant          ║
║  Type your request and press Enter.  ║
║  Ctrl+C or 'exit' to quit.           ║
╚══════════════════════════════════════╝

  Backend : anthropic
  Model   : claude-sonnet-4-6
  Work dir: .

>

Type your request and press Enter. Use exit or Ctrl+C to quit.

Example interactions

> List the files in this directory

> Read pyproject.toml and summarize it

> Create a file hello.py that prints Hello World, then run it

> Find all .py files and count their lines of code

> What does the agent.py file do?

Switching to a local model

The assistant supports llama.cpp as a local backend via its OpenAI-compatible server. Pass --backend local and --model to select the model. The llama-server is started and stopped automatically.

From a HuggingFace repo

Downloaded and cached on first run:

uv run lca --backend local --model LiquidAI/LFM2-24B-A2B-GGUF:Q4_0

If the repo requires authentication, set HF_TOKEN in your environment or .env file.

From a local GGUF file

uv run lca --backend local --model /path/to/model.gguf

Non-interactive mode

uv run lca --backend local --model LiquidAI/LFM2-24B-A2B-GGUF:Q4_0 -p "What does agent.py do?"

Manual server management

If you prefer to manage the llama-server yourself, omit --model and the assistant will connect to the already-running server:

# terminal 1 — start server manually
llama-server -hf LiquidAI/LFM2-24B-A2B-GGUF:Q4_0 --port 8080

# terminal 2 — connect without auto-start
uv run lca --backend local

Configuration

All settings are controlled via environment variables (or a .env file):

Variable	Default	Description
`LCA_BACKEND`	`anthropic`	Backend to use: `anthropic` or `local`
`ANTHROPIC_API_KEY`	—	Your Anthropic API key
`LCA_ANTHROPIC_MODEL`	`claude-sonnet-4-6`	Anthropic model name
`LCA_LOCAL_BASE_URL`	`http://localhost:8080/v1`	llama.cpp server URL
`LCA_LOCAL_MODEL`	`local`	Model passed to the server (HF path or file path)
`LCA_LOCAL_CTX_SIZE`	`32768`	Context window size for the local server
`LCA_LOCAL_GPU_LAYERS`	`99`	Number of layers to offload to GPU
`LCA_MAX_TOKENS`	`8192`	Max tokens per response
`LCA_WORKING_DIR`	`.`	Working directory for bash commands
`HF_TOKEN`	—	HuggingFace token (required for gated models)

CLI flags override env vars:

uv run lca --backend local --model LiquidAI/LFM2-24B-A2B-GGUF:Q4_0 --working-dir /path/to/my/project

Benchmarking

The benchmark/ directory contains two task suites, each with 10 tasks of increasing difficulty (easy → hard) and automated verifiers.

Default suite — this project

Tasks range from reading pyproject.toml to multi-file code analysis:

uv run python benchmark/run.py --backend anthropic --task 1,2,3

llama.cpp suite — real-world C++ codebase

Tasks operate on a large open-source C++ project:

git clone https://github.com/ggerganov/llama.cpp /tmp/llama.cpp

Output

Results are saved to benchmark/results/<timestamp>-<suite>-<backend>-<model>.json and a summary table is printed:

Model : claude-sonnet-4-6 (anthropic)
Date  : 2026-02-27 12:55

#    Task                                     Pass       Time       In/Out tokens  Turns
----------------------------------------------------------------------------------------
1    List directory                           ✓          4.8s            2191/267      2
...
10   Compare LLM backends                     ✓         47.5s          10594/2419      4
----------------------------------------------------------------------------------------

Score: 10/10  |  Total tokens: 75702  |  Avg time: 13.5s

Source code

View the complete source code on GitHub.

Overview

Local AI Apps

Mobile Deployment

Fine-Tuning

Community

How it works

Setup

Running

Example interactions

Switching to a local model

From a HuggingFace repo

From a local GGUF file

Non-interactive mode

Manual server management

Configuration

Benchmarking

Default suite — this project

llama.cpp suite — real-world C++ codebase

Output

Source code

Build docs developers (and LLMs) love

Overview

Local AI Apps

Mobile Deployment

Fine-Tuning

Community

Documentation Index

​How it works

​Setup

​Running

​Example interactions

​Switching to a local model

​From a HuggingFace repo

​From a local GGUF file

​Non-interactive mode

​Manual server management

​Configuration

​Benchmarking

​Default suite — this project

​llama.cpp suite — real-world C++ codebase

​Output

​Source code

Build docs developers (and LLMs) love

How it works

Setup

Running

Example interactions

Switching to a local model

From a HuggingFace repo

From a local GGUF file

Non-interactive mode

Manual server management

Configuration

Benchmarking

Default suite — this project

llama.cpp suite — real-world C++ codebase

Output

Source code