How it works
The assistant runs a loop:- You type a request
- The model decides which tools to call
- Tools are executed and results are fed back to the model
- The loop continues until the model has nothing left to do
- The final response is printed and you can type the next request
| Tool | What it does |
|---|---|
read_file | Read the contents of a file |
write_file | Create or overwrite a file |
list_directory | List files in a directory |
run_bash | Run any shell command (grep, git, python, tests, …) |
Setup
Prerequisites: uvRunning
exit or Ctrl+C to quit.
Example interactions
Switching to a local model
The assistant supports llama.cpp as a local backend via its OpenAI-compatible server. Pass--backend local and --model to select the model. The llama-server is started and stopped automatically.
From a HuggingFace repo
Downloaded and cached on first run:HF_TOKEN in your environment or .env file.
From a local GGUF file
Non-interactive mode
Manual server management
If you prefer to manage the llama-server yourself, omit--model and the assistant will connect to the already-running server:
Configuration
All settings are controlled via environment variables (or a.env file):
| Variable | Default | Description |
|---|---|---|
LCA_BACKEND | anthropic | Backend to use: anthropic or local |
ANTHROPIC_API_KEY | — | Your Anthropic API key |
LCA_ANTHROPIC_MODEL | claude-sonnet-4-6 | Anthropic model name |
LCA_LOCAL_BASE_URL | http://localhost:8080/v1 | llama.cpp server URL |
LCA_LOCAL_MODEL | local | Model passed to the server (HF path or file path) |
LCA_LOCAL_CTX_SIZE | 32768 | Context window size for the local server |
LCA_LOCAL_GPU_LAYERS | 99 | Number of layers to offload to GPU |
LCA_MAX_TOKENS | 8192 | Max tokens per response |
LCA_WORKING_DIR | . | Working directory for bash commands |
HF_TOKEN | — | HuggingFace token (required for gated models) |
Benchmarking
Thebenchmark/ directory contains two task suites, each with 10 tasks of increasing difficulty (easy → hard) and automated verifiers.
Default suite — this project
Tasks range from readingpyproject.toml to multi-file code analysis:
llama.cpp suite — real-world C++ codebase
Tasks operate on a large open-source C++ project:Output
Results are saved tobenchmark/results/<timestamp>-<suite>-<backend>-<model>.json and a summary table is printed: